Closed C0nsultant closed 3 years ago
If possible this should be resolved within paperless-ng. There is no easy way to make those features compatible.
redo-ocr does "surgery" on an existing PDF's content stream, removing text that appears to be related to OCR, and grafting on a layer of invisible OCR text. redo-ocr is probably the wrong option for a scanned image PDF.
I'll close the issue now. If you have further related questions feel free to reopen it.
Describe the scenario paperless-ng uses OCRmyPDF to perform text recognition. By default, no lossy transformations are applied. Also by default, some documents are parsed using
--redo-ocr
. I would like to use--deskew
since the ADF on my printer generally does not properly align the documents. In paperless-ng, using lossy transformations is not a big concern since the original documents are stored alongside the output of OCRmyPDF. Using both options results inTo Reproduce
Expected behavior If the current validation error happens purely because the lossy transformation is considered unsafe, an option to explicitly allow it would suffice. If this behaviour is caused by something else, let's discuss.