kermitt2 / biblio-glutton

A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
117 stars 15 forks source link

Docker and docker-compose #73

Open lfoppiano opened 2 years ago

lfoppiano commented 2 years ago

This PR provides:

For the moment I've pushed one image on docker hub:lfoppiano/biblio-glutton-lookup:0.2 which works fine with a pre-existing LMDB database and elastic search, running on the host machine.

lfoppiano commented 2 years ago

Testing the crossref dump loading via docker:

docker exec 29f22d257f4e java -jar lib/lookup-service-0.2-onejar.jar crossref --input /app/data/sources/crossref_public_data_file_2021_01 /app/lookup/config/glutton.yml

[...]

-- Counters --------------------------------------------------------------------
crossrefLookup_rejectedRecords
             count = 3997391

-- Meters ----------------------------------------------------------------------
crossrefLookup
             count = 108449979
         mean rate = 8397.29 events/second
     1-minute rate = 6769.17 events/second
     5-minute rate = 6807.32 events/second
    15-minute rate = 6933.23 events/second

INFO  [2022-04-28 06:53:57,257] com.scienceminer.lookup.command.LoadCrossrefCommand: Number of Crossref records processed: 108502026
INFO  [2022-04-28 06:53:57,309] com.scienceminer.lookup.command.LoadCrossrefCommand: Crossref lookup size {crossref_Jsondoc=108502027} records.
INFO  [2022-04-28 06:53:57,309] com.scienceminer.lookup.command.LoadCrossrefCommand: Crossref latest indexed date 2020-04-04T04:06:33.
lfoppiano commented 2 years ago

Using the single docker image with an external elasticsearch and grobid I successfully loaded the lmdb launching commands from within the docker image.

I did not test the docker-compose but the principle should be the same (mount the data directory and launch the command)