modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.02k stars 377 forks source link

How to overwrite library method #219

Open mandys opened 4 years ago

mandys commented 4 years ago

Hi,

I am using the getRawTextContent() method.

let pdfParser = new PDFParser(this, 1); pdfParser.setPassword(mypassword); pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError)); pdfParser.on("pdfParser_dataReady", pdfData => { fs.writeFile("parsed.txt", pdfParser.getRawTextContent(), () => {

});

Now, if I use the one provided by library, the words in my line don't have proper spacing. However, if I alter the method, it works fine.

eg:

From pdf.js

If I just change

prevText.str += textObj.str;

to

prevText.str += textObj.str + " ";

All my code works fine.

But, I want to know the best way to override this function in my code.

cls.prototype.getRawTextContent = function() {
    let retVal = "";
    if (!this.needRawText)
        return retVal;

    _.each(this.rawTextContents, function(textContent, index) {
        let prevText = null;
        _.each(textContent.bidiTexts, function(textObj, idx) {
            if (prevText) {
                if (Math.abs(textObj.y - prevText.y) <= 9) {
                    **prevText.str += textObj.str;**
                }
                else {
                    retVal += prevText.str  + "\r\n";
                    prevText = textObj;
                }
            }
            else {
                prevText = textObj;
            }

        });
        if (prevText) {
            retVal += prevText.str;
        }
        retVal += "\r\n----------------Page (" + index + ") Break----------------\r\n";
    });

    return retVal;
};
NatanB4 commented 2 years ago

I have a very similar problem, did you manage to solve yours? @mandys