Open kermitt2 opened 6 years ago
Everything seems to work properly except one thing, the number of readers is limited and asynchronous calls from the javascript frontend results in exceptions. https://github.com/lmdbjava/lmdbjava/issues/65#issuecomment-386505879
The commit 7372366 should have solved the last issue with the number of readers.
KB: 37413613 concepts.
EN: 14899737 pages.
DE: 3579552 pages.
FR: 3681264 pages.
ES: 3322291 pages.
IT: 2291751 pages.
For languages other than english the domains are not resolved. Not a clue why.
It's because the domains are derived from the English categories only, the other languages first do not have the same category hierarchy (then we would need a mapping per language) and have a much small set of categories.
I wasn't clear. I wanted to say that with this branch there there are no domains at all, while on the master version the domains are in the output json.
Might be solved by rebuilding again all the databases?
Mmm I dont understand. The domains are not produced for English or the domains are not in the disambiguation result json?
For the domains, they are built one time by the Upper KB and they are build just after building the Lower KB for English. It's like any db, if you want to force it to be rebuild, just delete the lmdb files and relaunch.
They are not in the output json. But it was just a note on the task.
before lmdbjava:
after lmdbjava:
Interesting thing is that the total number of concepts and pages correspond (see https://github.com/kermitt2/entity-fishing/issues/50)
might be that the interlingual files are missing in your resource files, so the KB cannot relate the English domain to an Italian entity
lmdbjava is apparently better maintained (more features & more OS built) and faster... also never get the zero copy mode working reliably with lmdbjni so it is worth trying lmdbjava for this too.