internetarchive / archive-pdf-tools

Fast PDF generation and compression. Deals with millions of pages daily.
https://archive-pdf-tools.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
97 stars 13 forks source link

Support actual recompression of an existing PDF without any input hOCR or input images #12

Open MerlijnWajer opened 3 years ago

MerlijnWajer commented 3 years ago

We could just extract every image from a PDF, and insert the MRC compressed images in its place. This way we could just compress existing PDFs, much like the foxit pdf compressor does: https://www.foxit.com/compress-pdf/

This would actually be a pretty trivial, but powerful, addition.