kermitt2 / biblio-glutton

A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
117 stars 15 forks source link

Update to Elastic7, all crossref dump format supported #61

Closed kermitt2 closed 2 years ago

kermitt2 commented 2 years ago

Updates:

Indexing with CrossRef Torrent Academic dump (January, 7, 2021, 120M records):

I didn't find the courage to integrate the other changes in PR #50 #58 but I will try.

kermitt2 commented 2 years ago

Note: I realized that with ES 7, to get the size of the ES index, we need to use the count API. The search API only provides an approximation which is useless (like "more than 10K" for the 160M documents). -> this is fixed in PR #62.

kermitt2 commented 2 years ago

follow up in #66