Could it be possible for you to expose the getTextContent method via let's say a Content property to get easily a page raw text?
Use Case
The developer needs to generate a PDF via let's say PhantomJS for example.
Inside the PDF file, specific text content needs to be extracted.
When accessing data.Pages via the pdfParser_dataReady callback, the developer could grab a page text Content promise for further processing, instead of dealing with text.R[0].T manipulations(loops, encoding, etc.). pdf2json is invoked from phantomJS via a node.js sub-process.
Proposed Implementation
Add a Content property in pdf.js.
Could it be possible for you to expose the
getTextContent
method via let's say aContent
property to get easily a page raw text?Use Case
data.Pages
via thepdfParser_dataReady
callback, the developer could grab a page textContent
promise for further processing, instead of dealing withtext.R[0].T
manipulations(loops, encoding, etc.).pdf2json
is invoked from phantomJS via anode.js
sub-process.Proposed Implementation Add a
Content
property inpdf.js
.If there's another approach that deals with funky characters easily without introducing an API add-on, I'd be glad to hear about it.