kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
275 stars 74 forks source link

ERROR: BAD_INPUT_DATA #49

Closed sirigg98 closed 1 year ago

sirigg98 commented 2 years ago

Hi!

I am trying to use the processHeaderDocument service (python 3.9.7, windows10), using the following curl request (running a test): curl -v --form input=C:\Users\Downloads\test\test.pdf localhost:8070/api/processHeaderDocument

and I keep getting the following message: Trying 127.0.0.1:8070... Connected to localhost (127.0.0.1) port 8070 (#0) POST /api/processHeaderDocument HTTP/1.1 Host: localhost:8070 User-Agent: curl/7.83.1 Accept: / Content-Length: 182 Content-Type: multipart/form-data; boundary=------------------------cdbb7a56da07d8aa

We are completely uploaded and fine Mark bundle as not supporting multiuse HTTP/1.1 500 Internal Server Error Date: Tue, 16 Aug 2022 15:34:00 GMT Content-Type: application/xml Content-Length: 64

*[BAD_INPUT_DATA] PDF to XML conversion failed with error code: 1 Connection #0 to host localhost left intact**

Using the python client, I am running: client = GrobidClient(config_path="D:\git repo\grobid_client_python\config.json") GROBID server is up and running client.process_pdf("processHeaderDocument", r"C:\Users\F0064WK\Downloads\test\eur_franses_AE73 (1).pdf", consolidate_header= False, generateIDs = False, consolidate_citations = False, include_raw_citations = False, include_raw_affiliations = False, tei_coordinates = False, segment_sentences = False) Traceback (most recent call last): ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Could I get some clarity on this, please? Thanks for the help-- and the great service!

lfoppiano commented 2 years ago

@sirigg98 if I'm not mistaken you're running Grobid on windows. Please be advised that Windows is not supported. See here. That could be the cause of the error BAD_INPUT_DATA.

As stated in the documentation the best soution is to run grobid via Docker and use the python client from your windows computer.

sirigg98 commented 2 years ago

Hey @lfoppiano,

I'm using the docker image. The command line code is: docker pull lfoppiano/grobid:0.7.0 docker run -t --rm --init lfoppiano/grobid:0.7.0

However, I run into the ConnectionError exception outlined above when using the python client. Any suggestions?

lfoppiano commented 2 years ago

Hi @sirigg98, when you run the docker image you also have to map the port correctly, using something like: -p 8070:8070 See the command here.

sirigg98 commented 1 year ago

Thanks a ton @lfoppiano! This seems to have done the trick. Closing this issue now.