Open oterrier opened 2 years ago
Hi Patrice,
Not sure if we should preload the whole dbs (too much memory involved) or just a subset like the N most frequently used entries ?
Looking at the code of com.scienceminer.nerd.utilities.WikipediaLabelIDF
I can see that you already have the occurence count stored in the LabelDatabase so for this one it should be easy
But don't know how to proceed for the others (PageDb, etc...)
Maybe a persistant EHCache with an LFU policy could be associated to every KBDatabase so that the N most frequent entries (just the key anyway) could be stored and retrieved at startup ?
Just some thoughts
Best regards
Olivier
Having most of lmdb pages loaded in memory speed up a lot the processing even though it requires a lot of RAM. Maybe we could add an option (in the config files) or a REST end point to force the full (almost full?) loading of some lmdb databases ofr a given languages (the ones that are required to do a disambiguation):
What do you think ?
Best regards
Olivier Terrier