Open Mark-Joy opened 2 years ago
It seems that because of 390fdf8, ocr-text is now packed in Form XObject. Simple solution is to name that object for easy detection and removal. Something like:
text_xobj_name = Name('/ocrmypdf-' + str(uuid.uuid4()))
I experienced this problem with ocrmypdf 14.0.1 when I decided to test performance between tesseract 5 and 4 by redoing a previously-performed "4" pdf as "5".
Describe the bug
As title described, --redo-ocr doesn't remove previous ocr layer made by ocrmypdf
To Reproduce Steps to reproduce the behavior.
files.zip
Expected behavior Ocr-text layer previous made by ocrmypdf should be removed when using option --redo-ocr Screenshots If applicable, add screenshots to help explain your problem.
System (please complete the following information):
Installation
pip install ocrmypdf
Additional context Add any other context about the problem here.