Closed Soul-Code closed 3 months ago
Hello !
It seems that you are using biblio-glutton online for consolidation and it is not responding (probably because I stopped the biblio-glutton demo... it was overloaded by queries). You can set to crossref
the consolidation option on the server with the config file if you just occasionally use Grobid or install your own biblio-glutton if the usage is more intensive.
I need to better manage connectivity issues with biblio-glutton.
Thanks very much! I'll try to deploy the biblio-glutton.
I'll try to deploy the biblio-glutton.
biblio-glutton is very heavy to install and deploy, it's for scaling beyond CrossRef web API capacities and having richer metadata, slightly more accurate matching. For a normal usage, a few thousand PDF, using CrossRef API for consolidation is good (see here).
Thanks for the replay. I've tried both these two methods. When I use CrossRef Web API, I met an error looks like a network error. Maybe I should start a proxy service for this? And then I try to build the docker image of biblio-glutton. But I find out this.
Building biblio
Step 1/23 : FROM openjdk:8-jdk as builder
---> 5bbce51c9625
Step 2/23 : USER root
---> Using cache
---> 8b60e1fa24f4
Step 3/23 : RUN apt-get update
---> Using cache
---> 231dd5bef5b6
Step 4/23 : WORKDIR /app/glutton-source
---> Using cache
---> 5f08186cdab0
Step 5/23 : RUN mkdir -p .gradle
---> Using cache
---> fda93c61777e
Step 6/23 : VOLUME /app/glutton-source/.gradle
---> Using cache
---> 79018165cfcd
Step 7/23 : COPY lookup/ ./lookup/
---> Using cache
---> e825da01add4
Step 8/23 : COPY matching/ ./matching/
---> Using cache
---> 50cb4a2abc0b
Step 9/23 : RUN cd /app/glutton-source/lookup && ./gradlew clean assemble --no-daemon
---> Using cache
---> d6a46e68ffcb
Step 10/23 : FROM openjdk:8-jre-slim
---> 781db64a09bd
Step 11/23 : RUN apt-get update -qq && apt-get -qy install curl build-essential unzip
---> Using cache
---> 2560d3028235
Step 12/23 : RUN mkdir -p /app
---> Using cache
---> bc2135d44697
Step 13/23 : WORKDIR /app
---> Using cache
---> 83bfc4becbe3
Step 14/23 : RUN curl -sL https://deb.nodesource.com/setup_10.x | bash -
---> Using cache
---> 00fdd2beef0a
Step 15/23 : RUN apt-get update -qq && apt-get -y install nodejs
---> Using cache
---> 201e18a7e6ba
Step 16/23 : COPY --from=builder /app/glutton-source/matching /app/matching
---> Using cache
---> bd04082980e4
Step 17/23 : RUN cd matching; npm install
---> Running in 1d216a66ef99
/bin/sh: 1: npm: not found
ERROR: Service 'biblio' failed to build: The command '/bin/sh -c cd matching; npm install' returned a non-zero code: 127
After that I find a image from docekrhub https://registry.hub.docker.com/r/bjrne/biblio-glutton and try it. It started successfully. But return a 500 error everytime I call the interface which looks like this.
biblio_1 | WARN [2021-12-18 16:37:18,547] org.glassfish.jersey.internal.Errors: The following warnings have been detected: WARNING:
biblio_1 | WARNING: The (sub)resource method getByBiblioStringWithPost in com.scienceminer.lookup.web.resource.LookupController conta
biblio_1 | WARNING: The (sub)resource method getDoiByMetadataDoi in com.scienceminer.lookup.web.resource.OAController contains empty
biblio_1 | WARNING: The (sub)resource method getDocumentSize in com.scienceminer.lookup.web.resource.DataController contains empty pa
biblio_1 | WARNING: The (sub)resource method getDoiByMetadataDoi in com.scienceminer.lookup.web.resource.OaIstexController contains e
biblio_1 |
biblio_1 | WARN [2021-12-18 16:38:37,128] com.scienceminer.lookup.web.resource.LookupController: DOI did not matched, move to additi
biblio_1 | 172.29.0.1 - - [18/Dec/2021:16:38:37 +0000] "GET /service/lookup?doi=10.1484/J.QUAESTIO.1.103624 HTTP/1.1" 404 58 "-" "cur
biblio_1 | 172.29.0.1 - - [18/Dec/2021:16:44:03 +0000] "GET /service/lookup?parseReference=false&atitle=Attention+Is+All+You+Need&firstAuthor=Vaswani HTTP/1.1" 500 58 "-" "Apache-HttpClient/4.5.10 (Java/11.0.11)" 246
biblio_1 | 172.29.0.1 - - [18/Dec/2021:16:44:27 +0000] "GET /service/lookup?parseReference=false&atitle=Attention+Is+All+You+Need&firstAuthor=Vaswani HTTP/1.1" 500 58 "-" "Apache-HttpClient/4.5.10 (Java/11.0.11)" 19
biblio_1 | 172.29.0.1 - - [18/Dec/2021:16:47:32 +0000] "GET /service/lookup?parseReference=false&atitle=Attention+Is+All+You+Need&firstAuthor=Vaswani HTTP/1.1" 500 58 "-" "Apache-HttpClient/4.5.10 (Java/11.0.11)" 7
biblio_1 | 172.29.0.1 - - [18/Dec/2021:16:47:33 +0000] "GET /service/lookup?parseReference=false&atitle=Attention+Is+All+You+Need&firstAuthor=Vaswani HTTP/1.1" 500 58 "-" "Apache-HttpClient/4.5.10 (Java/11.0.11)" 10
But whatever, it worked for me now. Without consolidation but no blocking.
If you're calling the consolidation service behind a proxy, you need to indicate the proxy information in the Grobid settings as documented here. Don't forget to indicate your email in the CrossRef parameter too for the "polite" usage.
The docker image is just the service without any data. You would still need to load and index the resources (crossref dump at least) into the database, and this is the heavy part. Consolidation is very valuable I think if you're exploiting extracted metadata and citations, so it worth setting CrossRef at least.
Thanks a lot. I fully understand.
Really nice tool. But I just encountered such a mistake.
Then I looked through the documents and found this.
But I don't have a heavy load. I only requested the interface a few times. A few hours later, it was still the same.
My CPU is 4 cores with 8G RAM. It should not be the reason for the lack of memory.
I'm confused about what caused this.
My log file like this.
Did I do something wrong that caused this to happen?