kermitt2 / biblio-glutton

A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
125 stars 16 forks source link

LMDB env scaling issue #10

Closed kermitt2 closed 5 years ago

kermitt2 commented 5 years ago

For simple lookup, we have quite a lot of failures due to:

! org.lmdbjava.Env$ReadersFullException: Environment maxreaders reached (-30790)

Apparently, LMDB cannot support more than 126 simultaneous readers? This looks like a weird hard coded limitation.

To reproduce these errors:

cd biblio-glutton/script

node oa_coverage -pmc ../data/pmc/PMID_PMCID_DOI.csv.gz > out.json

where PMID_PMCID_DOI.csv.gz is the usual PMID/DOI mapping from ftp://ftp.ebi.ac.uk/pub/databases/pmc/DOI/

I observe 1 failure like that for every [1,000-10,000] requests on a normal working station.

Possible fix (apart from slowing down the rate of queries at the client): pooling of several LMDB environments (at least 2) for each database?

lfoppiano commented 5 years ago

good catch!

126 is the default value, and indeed it's too low.

I would increase it as the maximum number of connections and catch such exception which should return 503 (so the client know is pushing too much)

kermitt2 commented 5 years ago

Thanks for the fix! No more maxreaders exception and no impact on query rates (6500 queries per second for the PMID_PMCID_DOI.csv.gz file on 8-thread machine).