Open shainaraza opened 2 years ago
Hello @shainaraza !
Server log are under grobid/logs/grobid-service.log
You should find the issue in the logs, or you could attach it here so that we can have a look.
is it version conflict, I just using openjdk 8. here is log grobid-service.log
even i use the example file from it, gets the same error
Thanks for the log file.
Windows is not supported any more by Grobid 0.7.0, pdfalto has not been recompiled for Windows 64 (lack of time/skills) and according to the log this is the problem.
You could use the docker image for using Grobid on Windows?
no I dont have a docker image, any alternative?
i try using COlab environment but it does not allow localhost
If you can't install docker on Windows, I think some people managed to get Grobid working the Windows subsystem/shell for Linux, but I am very ignorant on the matter.
hi @kermitt2 Now i am on linux system with all setup but I still get the same error 500, please see log grobid-service.log
According to the logs, the PDF is empty or it is a PDF image only:
! org.grobid.core.exceptions.GrobidException: [NO_BLOCKS] PDF parsing resulted in empty content
Do you have a text layer selectable on this PDF ?
@kermitt2 thanks you for replying me. I dont know text layer selection? in my case, both in Windows and Linux now, the localhost:8070 works and let me upload a file but gets 500 ERROR after that. advise please
If you open the PDF in a PDF viewer, can you select the text?
You can also try the pdftotext
command line on Linux, if it returns some text.
If there is no text, a preliminary OCR is required.
yes I can definitely select the text. I also use TIKA and pdftotext , the reason I want to use Grobid is because of its structure, metadata that it returns. so in short, the issue is with pdf file?
The error message on empty PDF is normally reliable because it comes from pdfalto.
Can you share the PDF here maybe? or by email if it's easier (see my email in the readme)
The error is the -noLineNumbers
and the --timeout XXX
on the end of the command.
Fix is here :- https://github.com/AaronNGray/grobid/commit/60d46b4948e221e24d749a758ee89c34b80cd81d
Note: there is atleast one other bug !
@kermitt2 Can you please have a look at the next bug :-
an instance variable PDFALTOSaxHandler.block is being used without being assigned in qName.equals("String") case in startElement().
ERROR [2023-04-30 17:55:59,536] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occurs.
! java.lang.NullPointerException: null
! at org.grobid.core.sax.PDFALTOSaxHandler.startElement(PDFALTOSaxHandler.java:446)
! at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
! at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
! at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
! at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
! at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
! at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
! at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
! at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
! at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
! at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
! at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
! at org.grobid.core.document.Document.parseInputStream(Document.java:308)
! at org.grobid.core.document.Document.parseInputStream(Document.java:313)
! at org.grobid.core.document.Document.addTokenizedDocument(Document.java:353)
! ... 73 common frames omitted
! Causing: org.grobid.core.exceptions.GrobidException: [PARSING_ERROR] Cannot parse file: C:\Users\aaron\GitHub\grobid\grobid-home\tmp\bb2K5exRr9.lxml
! at org.grobid.core.document.Document.addTokenizedDocument(Document.java:365)
! at org.grobid.core.engines.Segmentation.processing(Segmentation.java:95)
! at org.grobid.core.engines.FullTextParser.processing(FullTextParser.java:132)
! at org.grobid.core.engines.FullTextParser.processing(FullTextParser.java:111)
! at org.grobid.core.engines.Engine.fullTextToTEIDoc(Engine.java:507)
! at org.grobid.core.engines.Engine.fullTextToTEI(Engine.java:497)
! at org.grobid.service.process.GrobidRestProcessFiles.processFulltextDocument(GrobidRestProcessFiles.java:208)
! at org.grobid.service.GrobidRestService.processFulltext(GrobidRestService.java:268)
! at org.grobid.service.GrobidRestService.processFulltextDocument_post(GrobidRestService.java:220)
Looks like the handling of block
needs some revising :-
https://github.com/kermitt2/grobid/blob/cb39daccf1db20863f79dc6ab383262556d531d2/grobid-core/src/main/java/org/grobid/core/sax/PDFALTOSaxHandler.java#L46
Still getting the bug in Version 0.7.3: in Docker
R [2023-10-13 10:39:20,247] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occurs.
! org.grobid.core.exceptions.GrobidException: [NO_BLOCKS] PDF parsing resulted in empty content
! at org.grobid.core.document.Document.addTokenizedDocument(Document.java:417)
! at org.grobid.core.engines.Segmentation.processing(Segmentation.java:95)
! at org.grobid.core.engines.HeaderParser.processing(HeaderParser.java:81)
! at org.grobid.core.engines.Engine.processHeader(Engine.java:417)
! at org.grobid.core.engines.Engine.processHeader(Engine.java:385)
! at org.grobid.service.process.GrobidRestProcessFiles.processStatelessHeaderDocument(GrobidRestProcessFiles.java:99)
! at org.grobid.service.GrobidRestService.processHeaderDocumentReturnBibTeX_post(GrobidRestService.java:187)
! at jdk.internal.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
! at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
! at
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
! at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
! at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
! at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
Hello @Siedlerchr
Can you share the PDF to reproduce the problem?
I install even verison 0.7.0 but gets this error
Error encountered while requesting the server. Response 500: - The PDF document cannot be annotated. Please check the server logs.
any help