modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.01k stars 377 forks source link

Unable to find other page content #280

Open eeh456456 opened 2 years ago

eeh456456 commented 2 years ago

I try to parse a 300 page pdf and get the following content: XXXXXX ----------------Page (0) Break----------------

----------------Page (1) Break----------------

----------------Page (2) Break----------------

----------------Page (3) Break----------------

This is my code:

import PDFParser from "pdf2json";

const pdfParser = new PDFParser(this, 1);

function readPDF(fileName) {
    pdfParser.loadPDF(fileName);
    pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError));

    pdfParser.on("pdfParser_dataReady", pdfData => {
        const data = pdfParser.getRawTextContent()
        console.log('文本信息:', JSON.stringify(data)
    });
}
readPDF('1.pdf')
eeh456456 commented 2 years ago

https://pan.baidu.com/s/1YOGhQgt_jStHEAbjMQO9sg?pwd=mipv

Here is my project and pdf

eeh456456 commented 2 years ago

QQ图片20221010160000