kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.6k stars 461 forks source link

When running consolodation using Crossref, the connection timed out #1153

Closed Dangwei-dw closed 3 months ago

Dangwei-dw commented 3 months ago

The same problem occurs with grobid/grobid:0.8.0 and lfoppiano/grobid:0.8.0. Is this a proxy problem within docker? Thank you

INFO  [2024-08-09 11:37:23,503] org.eclipse.jetty.server.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@216f01{Application context,/,null,AVAILABLE}
INFO  [2024-08-09 11:37:23,504] io.dropwizard.core.setup.AdminEnvironment: tasks =

    POST    /tasks/log-level (io.dropwizard.servlets.tasks.LogConfigurationTask)
    POST    /tasks/gc (io.dropwizard.servlets.tasks.GarbageCollectionTask)

INFO  [2024-08-09 11:37:23,507] org.eclipse.jetty.server.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@762a10b6{Admin context,/,null,AVAILABLE}
INFO  [2024-08-09 11:37:23,570] org.eclipse.jetty.server.AbstractConnector: Started application@2ffb3aec{HTTP/1.1, (http/1.1)}{0.0.0.0:8070}
INFO  [2024-08-09 11:37:23,583] org.eclipse.jetty.server.AbstractConnector: Started admin@786ff1cb{HTTP/1.1, (http/1.1)}{0.0.0.0:8071}
INFO  [2024-08-09 11:37:23,584] org.eclipse.jetty.server.Server: Started Server@60a7e509{STARTING}[11.0.14,sto=30000] @11312ms
INFO  [2024-08-09 11:42:17,282] org.grobid.core.factory.GrobidPoolingFactory: Number of Engines in pool active/max: 1/10
INFO  [2024-08-09 11:44:43,128] org.grobid.core.utilities.Consolidation: Consolidation service returns error (-1) : org.apache.http.conn.ConnectTimeoutException thrown during request execution :  (,query.bibliographic=Table 7: Evaluation results of DE<>EN and ZH<>EN translations across four domains.
References
Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke
Zettlemoyer, and Marjan Ghazvininejad. 2022. In-
context examples selection for machine translation.,rows=1)
Connect to api.crossref.org:443 [api.crossref.org/208.254.38.72] failed: Connection timed out
INFO  [2024-08-09 11:46:54,199] org.grobid.core.utilities.Consolidation: Consolidation service returns error (-1) : org.apache.http.conn.ConnectTimeoutException thrown during request execution :  (,query.bibliographic=Farhad Akhbardeh, Arkady Arkhangorodsky, Mag-
dalena Biesialska, Ondřej Bojar, Rajen Chatterjee,
Vishrav Chaudhary, Marta R Costa-jussà, Cristina
España-Bonet, Angela Fan, Christian Federmann,
et al. 2021. Findings of the 2021 conference on ma-
chine translation (WMT21). In Proceedings of the
Sixth Conference on Machine Translation, pages 1-
88.
Anonymous. 2023a. Dissecting in-context learning of
translations in gpt-3. Anonymous preprint under re-
view.,rows=1)
Connect to api.crossref.org:443 [api.crossref.org/208.254.38.72] failed: Connection timed out
INFO  [2024-08-09 11:49:05,271] org.grobid.core.utilities.Consolidation: Consolidation service returns error (-1) : org.apache.http.conn.ConnectTimeoutException thrown during request execution :  (,query.bibliographic=Anonymous. 2023b. Does gpt-3 produces less literal
translations? Anonymous preprint under review.,rows=1)
Connect to api.crossref.org:443 [api.crossref.org/208.254.38.72] failed: Connection timed out
lfoppiano commented 3 months ago

Hi @Dangwei-dw, it seems you're calling grobid to consolidate references, which could generate quite a lot of requests toward Crossref. The fact that you get Connection timed out is not really a problem of Grobid. Crossref throttle the traffic and limit the access to their free API.

However, if you haven't done it, you can add your email in the grobid configuration.

See some useful comments.

Dangwei-dw commented 3 months ago

OK, thank you for your reply @lfoppiano