kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
279 stars 75 forks source link

grobid-client.py failing after several files repeatedly but on different files #5

Open j4ffle opened 4 years ago

j4ffle commented 4 years ago

I am running Grobid Server 0.5.6 in command line on a practice folder with 127 pdfs (as a practice run before running on 80,000+ pdfs. After a few pdfs (each attempt on the same folder returns a different number of tei files), it fails and gives me the following error:

Task :grobid-service:run FAILED FAILURE: Build failed with an exception. * What went wrong:
Execution failed for task ':grobid-service:run'. > Process 'command 'C:\Program Files\Java\jdk1.8.0_231\bin\java.exe'' finished with non-zero exit value 1
Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0. Use '--warning-mode all' to show the individual deprecation warnings. See https://docs.gradle.org/5.4.1/userguide/command_line_interface.html#sec:command_line_warnings BUILD FAILED in 1m 20s
6 actionable tasks: 1 executed, 5 up-to-date

I can restart the server and run the program again and it works for a few files and then fails again. Following, https://github.com/kermitt2/grobid-client-python/issues/2, I switched to the GROBID public demo server and it seemed to work, however, not all the tei files were created in my output folder. I had to run the program 4 consecutive times to have all 127 files in the output folder. It correctly skipped over the files that it already converted. Since it runs and I am able to eventually convert all the files in the folder, I'm not sure where I am messing up. I'm hoping to be able to run this on 80,000+ pdfs so it would be nice to not have to worry about these failures.

I originally thought it was a corrupt file on my end, but it seems to fail at different files each time I run it. Any thoughts?

Thank you!