modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
1.98k stars 378 forks source link

Trying to parse a file upload #223

Open scottswigart opened 3 years ago

scottswigart commented 3 years ago

I'm getting an InvalidPDFException when trying to parse a PDF file upload:

var pdfParser = new PDFParser(this,1)
pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError) );
pdfParser.on("pdfParser_dataReady", pdfData => {
    console.log(pdfParser.getRawTextContent());
});
pdfParser.parseBuffer(req.files[file].buffer);

The file is uploading, and the buffer has data in it:

req.files[file] contains: {fieldname: 'test', originalname: 'test.pdf', encoding: '7bit', mimetype: 'application/pdf', buffer: Buffer(167518), …}

modesty commented 3 weeks ago

seems like a buffer issue, please test it with latest master (v3.1.5 w/ https://github.com/modesty/pdf2json/commit/bcbebdbb3f6aec20713d300596bd0ea06f2e5918 )