kermitt2 / biblio-glutton

A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
117 stars 15 forks source link

Slow importing of Crossref full metadata dump in LMDB #88

Open steppo83 opened 1 year ago

steppo83 commented 1 year ago

Hello, I'm trying to use biblio-glutton inside a pod in kubernetes but I'm facing a slowness the importing of Crossref full metadata dump: image

Pos has this setup: resources:        limits:          cpu: '2'          memory: 8Gi        requests:          cpu: 250m          memory: 64Mi

image

The config of biblio-glutton has the default settings. Is there that I'm missing? How to improve the importing?

Thanks!

karatekaneen commented 1 year ago

Sounds like a Kubernetes issue to me. We are running Glutton in K8s and haven't encountered indexing this slow. What kind of disks do you have backing the PV?

karatekaneen commented 1 year ago

I actually encountered slow indexing myself when running the version from #90 and the 2023 dump. Nothing in our environment has changed except those two things and we usually index about 5-7k/s. Now running on less than 1k/s:

crossrefLookup
             count = 62658487
         mean rate = 1013.66 events/second
     1-minute rate = 927.42 events/second
     5-minute rate = 836.58 events/second
    15-minute rate = 800.44 events/second