Open SebastianFeltl opened 1 year ago
Hi @SebastianFeltl !
Nice & surprising error case, thank you.
pdfalto (our library for parsing the pdf) crashes because of the annotations in this PDF. It's even more surprising that I don't see any annotation in the PDF.
I open an issue in the pdfalto repo.
The following crashes:
./grobid-home/pdfalto/lin-64/pdfalto_server -fullFontName -noLineNumbers -noImage -annotation -filesLimit 2000 ~/Downloads/Klug.Hahn.2021.-.Conversational.Interfaces.and.Digital.Empathy.pdf --timeout 120
Segmentation fault (core dumped)
this works fine:
./grobid-home/pdfalto/lin-64/pdfalto_server -fullFontName -noLineNumbers -noImage -filesLimit 2000 ~/Downloads/Klug.Hahn.2021.-.Conversational.Interfaces.and.Digital.Empathy.pdf --timeout 120
ll ~/Downloads/Klug.Hahn.2021.-.Conversational.Interfaces.and.Digital.Empathy.xml
-rw-rw-r-- 1 lopez lopez 869K Jun 23 19:30 /home/lopez/Downloads/Klug.Hahn.2021.-.Conversational.Interfaces.and.Digital.Empathy.xml
We use GROBID in our Service to convert PDFs to XML and to then process them further.
All of the tested PDFs worked except one (Klug Hahn 2021 - Conversational Interfaces and Digital Empathy.pdf.
We then tried to use and online demo of GROBID we found in an old error Issue and we got this error message
We don´t know why this specific PDF doesn´t work and would like to request your help in figuring out this problem.
We are using Ubuntu 20.04.6 LTS for our Service, but since the online demo throws the same error, I don´t know if you need any System infos from us.