Open lfoppiano opened 2 years ago
Testing the crossref dump loading via docker:
docker exec 29f22d257f4e java -jar lib/lookup-service-0.2-onejar.jar crossref --input /app/data/sources/crossref_public_data_file_2021_01 /app/lookup/config/glutton.yml
[...]
-- Counters --------------------------------------------------------------------
crossrefLookup_rejectedRecords
count = 3997391
-- Meters ----------------------------------------------------------------------
crossrefLookup
count = 108449979
mean rate = 8397.29 events/second
1-minute rate = 6769.17 events/second
5-minute rate = 6807.32 events/second
15-minute rate = 6933.23 events/second
INFO [2022-04-28 06:53:57,257] com.scienceminer.lookup.command.LoadCrossrefCommand: Number of Crossref records processed: 108502026
INFO [2022-04-28 06:53:57,309] com.scienceminer.lookup.command.LoadCrossrefCommand: Crossref lookup size {crossref_Jsondoc=108502027} records.
INFO [2022-04-28 06:53:57,309] com.scienceminer.lookup.command.LoadCrossrefCommand: Crossref latest indexed date 2020-04-04T04:06:33.
Using the single docker image with an external elasticsearch and grobid I successfully loaded the lmdb launching commands from within the docker image.
I did not test the docker-compose but the principle should be the same (mount the data directory and launch the command)
This PR provides:
For the moment I've pushed one image on docker hub:
lfoppiano/biblio-glutton-lookup:0.2
which works fine with a pre-existing LMDB database and elastic search, running on the host machine.