Open mandys opened 4 years ago
Hi,
I am using the getRawTextContent() method.
let pdfParser = new PDFParser(this, 1); pdfParser.setPassword(mypassword); pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError)); pdfParser.on("pdfParser_dataReady", pdfData => { fs.writeFile("parsed.txt", pdfParser.getRawTextContent(), () => {
mypassword
});
Now, if I use the one provided by library, the words in my line don't have proper spacing. However, if I alter the method, it works fine.
eg:
From pdf.js
If I just change
prevText.str += textObj.str;
to
prevText.str += textObj.str + " ";
All my code works fine.
But, I want to know the best way to override this function in my code.
cls.prototype.getRawTextContent = function() { let retVal = ""; if (!this.needRawText) return retVal; _.each(this.rawTextContents, function(textContent, index) { let prevText = null; _.each(textContent.bidiTexts, function(textObj, idx) { if (prevText) { if (Math.abs(textObj.y - prevText.y) <= 9) { **prevText.str += textObj.str;** } else { retVal += prevText.str + "\r\n"; prevText = textObj; } } else { prevText = textObj; } }); if (prevText) { retVal += prevText.str; } retVal += "\r\n----------------Page (" + index + ") Break----------------\r\n"; }); return retVal; };
I have a very similar problem, did you manage to solve yours? @mandys
Hi,
I am using the getRawTextContent() method.
let pdfParser = new PDFParser(this, 1); pdfParser.setPassword(
mypassword
); pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError)); pdfParser.on("pdfParser_dataReady", pdfData => { fs.writeFile("parsed.txt", pdfParser.getRawTextContent(), () => {});
Now, if I use the one provided by library, the words in my line don't have proper spacing. However, if I alter the method, it works fine.
eg:
From pdf.js
If I just change
prevText.str += textObj.str;
to
prevText.str += textObj.str + " ";
All my code works fine.
But, I want to know the best way to override this function in my code.