hellerbarde / stapler

A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk
Other
283 stars 53 forks source link

Is it possible to merge hOCR HTML into a PDF file? #82

Closed tukusejssirs closed 2 years ago

tukusejssirs commented 3 years ago

I’d like to merge hOCR HTML into a PDF file. I thought I could do this using the following command, but it failed (see the output). Is it possible at all?

pdf-stapler background pdf_with_images.pdf hocr.html merged.pdf ```bash Traceback (most recent call last): File "/usr/bin/pdf-stapler", line 11, in load_entry_point('stapler==1.0.0rc1', 'console_scripts', 'pdf-stapler')() File "/usr/lib/python3.8/site-packages/staplelib/__init__.py", line 12, in main stapler.main() File "/usr/lib/python3.8/site-packages/staplelib/stapler.py", line 116, in main modes[mode](args) File "/usr/lib/python3.8/site-packages/staplelib/commands.py", line 145, in background filesandranges = iohelper.parse_ranges(args[:-1]) File "/usr/lib/python3.8/site-packages/staplelib/iohelper.py", line 103, in parse_ranges "pdf": read_pdf(inputname), File "/usr/lib/python3.8/site-packages/staplelib/iohelper.py", line 34, in read_pdf pdf = PdfFileReader(open(filename, "rb")) File "/usr/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1084, in __init__ self.read(stream) File "/usr/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1696, in read raise utils.PdfReadError("EOF marker not found") PyPDF2.utils.PdfReadError: EOF marker not found ```
hellerbarde commented 2 years ago

I don't think there is any support for any non-PDF files by the library we're using here. This is likely out of scope, unless you would like to create a clean pull request for it.