modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.02k stars 377 forks source link

Unable to parse pdf, no dataError #373

Open simowaer opened 1 week ago

simowaer commented 1 week ago

broken.pdf

Input:

import fs from "fs";
import PDFParser from "pdf2json";

const pdfParser = new PDFParser();

pdfParser.on("pdfParser_dataError", (errData) =>
    console.error(errData.parserError)
);
pdfParser.on("pdfParser_dataReady", (pdfData) => {
    fs.writeFile(
        "./broken.json",
        JSON.stringify(pdfData),
        (data) => console.log(data)
    );
});

pdfParser.loadPDF("./broken.pdf", 10);

Output:

Info: about to load PDF file ./broken.pdf
Info: Load OK: ./broken.pdf
Warning: Setting up fake worker.
Info: PDF loaded. pagesCount = 1
Info: start to parse page:1
Info: Skipped: tiny fill: 0 x 0