nisaacson / pdf-extract

Node PDF Extract
MIT License
383 stars 76 forks source link

Arguments to pdftotext #7

Closed urgent closed 10 years ago

urgent commented 10 years ago

I did not see if pdf_extract() function allowed for arguments to pdftotext. I am looking at lib/searchable.js line 30

I altered my local copy to

var child = spawn('pdftotext', (options.layout ? ['-layout'] : []).concat(options.ocr_flags).concat([pdf_path, '-']));

And call it with

var options = {
  type: 'text',  // extract the actual text in the pdf file
  ocr_flags: [
     '-f',1,
     '-l',1
  ]
}

var processor = pdf_extract(absolute_path_to_pdf, options, function(err) {             
  if (err) {
    res.end(util.inspect(err));
  }     
nisaacson commented 10 years ago

currently passing custom arguments to the pdftotext command are not supported.

Pull requests are welcome though :smile: