ocrmypdf produces wrong page size

ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

http://ocrmypdf.readthedocs.io/

Mozilla Public License 2.0

14.28k stars 1.02k forks source link

ocrmypdf produces wrong page size #1360

Open femifrak opened 4 months ago

femifrak commented 4 months ago

In contrast to ocrmypdf in.pdf out.pdf ocrmypdf --force-ocr in.pdf out.pdf produces an output page format (115 × 200 mm) different from the input (A5, 148 × 210 mm).

I've been using pikepdf 8.14.0, ocrmypdf 16.4.1 / Tesseract OCR-hOCR 5.4.1.

in.pdf out.pdf

femifrak commented 4 months ago

The behaviour that-fmay change the output page size appeared first with version 16.1. (16.0.4 does not show this bug.)

This is true for both renderers (sandwich and hoc).

in.pdf is a simple file without text but the same effect happens in pdfs with text: pages will be cut off in the middle of the text.

Jmuccigr commented 4 months ago

Is it perhaps this bug? #1181

femifrak commented 3 months ago

In my case it was sufficient to use--redo-ocr instead of --force-ocr. --redo-ocr does not have that issue.