modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
1.98k stars 379 forks source link

extracted data in the form of encoding format #132

Open duvemula opened 7 years ago

duvemula commented 7 years ago

Hi @modesty,

When I read data from pdf file the output data is in the form of encoding format. Can you please take a look? Periods.pdf

Code snippet: var pdfParser = new PDFParser(this,1); pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError) ); pdfParser.on("pdfParser_dataReady", pdfData => { console.log(pdfParser.getRawTextContent()); //console.log(pdfParser.getAllFieldsTypes()); fs.writeFile('' + filePath + '/Axioma Optimization 101.txt', pdfParser.getRawTextContent()); });

    pdfParser.loadPDF('' + filePath + '/Periods.pdf'); 

Regards, Durga Prasad

wanghaisheng commented 7 years ago

what do you mean by in the form of encoding format? what is the expected output

duvemula commented 6 years ago

Sorry for the delay @wanghaisheng , My pdf file is PDF image file. Can we read data from PDF image file.

Reards, Durga Prasad

wanghaisheng commented 6 years ago

@duvemula you can try ocrmypdf ,a wonderful library