kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
279 stars 75 forks source link

No files written to output directory & no API call #9

Open samanthadalal opened 4 years ago

samanthadalal commented 4 years ago

Hi! I'm trying to use GROBID to generate the .tei.xml file for some pdfs. I have installed GROBID according to the docs on my local machine using the following commands:

wget https://github.com/kermitt2/grobid/archive/0.6.0.zip
unzip 0.6.0.zip

and since I do not have many pdfs that I want to annotate I chose to use the public GROBID service http://cloud.science-miner.com/grobid/ and updated the config.json file accordingly. However when I run the commands below nothing is written to the OUTPUTS file and no API call is made either. I also checked the web service and when I tried to process a full text document there, I get a 503 error saying that the processFulltextDocument service is not available. None of the other services (processHeader, etc.) are available either.

xxx grobid-client-python % python3 grobid-client.py --input ~/Desktop/GROBID/test --output ~/Desktop/GROBID/OUTPUTS --config ~/Desktop/GROBID/grobid-client-python/config.json --force  processFulltextDocument
GROBID server is up and running
2 PDF files to process
/Users/samanthadalal/Desktop/GROBID/test/NeuralRegenRes12122021-5821355_161013.pdf
/Users/samanthadalal/Desktop/GROBID/test/nmc-59-213.pdf
runtime: 13.629 seconds 
xxx grobid-client-python % cd ..
xxx GROBID % cd OUTPUTS
xxx OUTPUTS % ls
xxx OUTPUTS % 

Could you please help me resolve this issue, I would very much appreciate it! Thank you :)

kermitt2 commented 4 years ago

Hi @samanthadalal, the public Grobid service was overloaded. I restarted it to clean the queue, you can try now, but it might be saturated again because some people launched heavy batches on it. I should probably reduce the quotas. In any cases, if you want to process files with some safety, better to use a local install - it's not complicated, it works on low profile hardware.