Open theoklpd opened 6 years ago
please install poppler-util and use pdfinfo tool to inspect the original pdf file
Thanks for your reply. I've tried (on Windows 10) Poppler's pdfinfo.exe on the following pdf document: DOC082_2014.pdf and got the following output: DOC082_2014_Poppler_Info_output.txt
So apparently the PDF is valid. Also Poppler's pdftotext.exe extracts the text elements from the PDF.
Therefore my question is. Why does the pdf2jon library does not recognize the pdf (text elements) ?
Sorry, Closed by mistake.
try pdffonts to see more i
Okay, more info from pdffont comming up: DOC082_2014_Poppler_Font_output.txt
I am meant to try your pdf against latest pdf.js , instead i choose https://github.com/SyslogicNL/pdf-extractor.git ,a wrapper around pdf.js and it seems your input file is fine .all the text can be extracted.
Please note that pdf-extractor, as far as I can see, uses another pdf.js as pdf2json ! But maybe you are already aware of that.
yes pdf2json use a very old version of pdf.js
url: http://www.delindeschemolen.nl/PDF%20bestanden/DOC082_2014.pdf
When performing a tryout of the pdf2json library, the PDF document in the above mentioned url. was not parsed correctly. No text was found in the document.