ropensci / pdftools

Text Extraction, Rendering and Converting of PDF Documents
https://docs.ropensci.org/pdftools
Other
513 stars 69 forks source link

feature request: pdf_deflate() #127

Open turpinandrew opened 1 year ago

turpinandrew commented 1 year ago

Would it be easy to expose the poppler::deflate(...) functionality (I am assuming it has to exist in there somewhere) inside this pdftools package?

  eg pdftools::pdf_deflate(pdf, outfilename)

While pdf_data() and pdf_text() have been very useful for 50% of the pdf data extraction work I am doing, some of the work requires processing the raw pdf itself (hunting for images, etc). Currently I am using

 system(sprintf("qpdf --qdf --replace-input --object-streams=disable %s", pdf))  # macos 

If I could avoid a 'system' call that relies on an OS specific tool, that would be fantastic and more portable than my current solution.