I am using GROBID for research which I need to extract text (processFulltextDocument) from some company annual report PDF files. I know GROBID is designed for academic documents but it is able to process most of my documents very well. The problem is, for some documents, like 30% of my whole document set (around 1000 PDFs), there were errors: [BAD_INPUT_DATA] 134, [BAD_INPUT_DATA] 139 and [GENERAL] An exception occurred while running Grobid. Besides, there are documents very similar to those with error codes and GROBID is able to process them. I have uploaded a few examples corresponding to each error code. Are there any workarounds or solutions for these errors? Thanks!
Examples with error code:
500: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 134
Document 1.pdf: failed with error 500, [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 134
Document 2.pdf: similar to Document 1 but with no error.
500: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 139
Document 3.pdf: failed with error 500, [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 139
Document 4.pdf: similar to Document 3 but with no error.
500: [GENERAL] An exception occurred while running Grobid.
Document 5.pdf: failed with error 500, [GENERAL] An exception occurred while running Grobid.
Document 6.pdf: failed with error 500, [GENERAL] An exception occurred while running Grobid.
Document 7.pdf: similar to Document 5 and 6 but with no error.
Hi mighty developers
I am using GROBID for research which I need to extract text (processFulltextDocument) from some company annual report PDF files. I know GROBID is designed for academic documents but it is able to process most of my documents very well. The problem is, for some documents, like 30% of my whole document set (around 1000 PDFs), there were errors: [BAD_INPUT_DATA] 134, [BAD_INPUT_DATA] 139 and [GENERAL] An exception occurred while running Grobid. Besides, there are documents very similar to those with error codes and GROBID is able to process them. I have uploaded a few examples corresponding to each error code. Are there any workarounds or solutions for these errors? Thanks!
Examples with error code:
Environment:
The error code also appears to be the same using local GROBID Service and HuggingFace