kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
279 stars 75 forks source link

python client for GROBID REST services note producing an output #1

Closed Santosh-Gupta closed 5 years ago

Santosh-Gupta commented 5 years ago

It seems to be processing the PDF file, but it's not leaving any output file. Here is my code

!git clone https://github.com/kermitt2/grobid-client-python

#download sample pdf import urllib.request

urllib.request.urlretrieve('https://arxiv.org/pdf/1705.04304', 'sorcher.pdf')

import os

#make output folder  
if not os.path.exists('output'):
    os.mkdir('output')

%cd grobid-client-python

!python3 grobid-client.py --input /content --output /content/output processFulltextDocument

%cd .. 
os.listdir('output')

For convenience here is a direct link to my colab notebook with the code, so you just have to run-> all

https://colab.research.google.com/drive/1hdXViFDWbqZKJS7hsTts7P6t7-dmieXB

kermitt2 commented 5 years ago

Hello! using the command line works fine for me for this file.

Very basic question, but it's not visible on your script, which grobid server are you using?

Santosh-Gupta commented 5 years ago

Hello,

Forgive my inexperience in python, but I think my mistake was that a server needs to be set up, but I didn't do that. I am very new to python. I just found this, will tinker around

https://grobid.readthedocs.io/en/latest/

kermitt2 commented 5 years ago

No problem! For testing, feel free to use the demo server by changing the host and port in the file config.json (change grobid_server to cloud.science-miner.com/grobid and put nothing for grobid_port). Of course, only for testing and small number of files... other people might use the demo and I put some quotas. If you're doing serious work with the tool, it is strongly recommended to set up your own server.