modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.02k stars 377 forks source link

line not recognized #275

Closed creasy2010 closed 2 years ago

creasy2010 commented 2 years ago

(pdf) file parsing cannot get line information Has anyone encountered such a problem?

test on version: 2.0.2、1.2.5;

···javascript const PDFParser = require("pdf2json"); const pdfParser = new PDFParser();

pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError) ); pdfParser.on("pdfParser_dataReady", pdfData => { fs.writeFile("./pdf2json/test/F1040EZ.json", JSON.stringify(pdfData)); });

pdfParser.loadPDF("./test.pdf"); ···

image

test.pdf

creasy2010 commented 2 years ago

Found the reason. This is a special pdf where the lines are not drawn, but combined using black and white blocks. See attachment for details。 There is no problem with the project, I will close this issue。

https://user-images.githubusercontent.com/1592277/178444362-a36c309a-408e-49d0-b36b-9640f61ad5ca.mov