modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.02k stars 377 forks source link

Loading pdfs from a remote server via a stream #65

Open zafarali opened 8 years ago

zafarali commented 8 years ago

I'm trying to load a single PDF from a remote server. Here is my approach: (I can confirm that if I just pipe the request into a write stream it saves the PDF fine)

var request = require('request');
var pdfParser = require('pdf2json');
var pdfUrl = 'somepdf.pdf'

var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);

pdfPipe.on("pdfParser_dataError", err => console.error(err) );
pdfPipe.on("pdfParser_dataReady", pdf => {
    //let pdf = pdfParser.getMergedTextBlocksIfNeeded();
    console.log(pdfParser.getAllFieldsTypes());
});

However, I'm getting an error:

stream.js:45
  dest.on('drain', ondrain);
       ^

TypeError: dest.on is not a function
    at Request.Stream.pipe (stream.js:45:8)
    at Request.pipe (/Users/zaf/development/minerva-bot/node_modules/request/request.js:1395:34)
    at Object.<anonymous> (/Users/zaf/development/minerva-bot/plugins/exam_module/index.js:9:53)
    at Module._compile (module.js:434:26)
    at Object.Module._extensions..js (module.js:452:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
    at Function.Module.runMain (module.js:475:10)
    at startup (node.js:117:18)
    at node.js:951:3

Code constructed from here: http://stackoverflow.com/a/36882510/3779915

modesty commented 8 years ago

try: var PDFParser = require("./pdf2json/PDFParser"); var pdfParser = new PDFParser(); var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);

VIRGO96 commented 4 years ago

try: var PDFParser = require("./pdf2json/PDFParser"); var pdfParser = new PDFParser(); var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);

I get an empty array i am trying to fetch a pdf from a remote server any idea what could cause that

kennylbj commented 2 years ago

We will need to use request({url: pdfUrl, encoding:null}).pipe(pdfParser.createParserStream()); now