Closed ghost closed 8 years ago
1) PyOCR doesn't support PDF (libpoppler ?) as input. You must have a conversion process first. Are you sure the output of this process is ok ?
2) Tesseract output is given in the exception : 'No script found in image (Too few characters. Skipping this page)'. Did this exception happen on an empty page ?
1) I'm using https://github.com/danielquinn/paperless 2) I just checked the PDF and there is a page with just a small barcode and no text.
Does this mean that empty pages have to be exorcised from PDFs? That's a problem with legal documents.
Many thanks for the great project!
1) Then please open a ticket at danielquinn/paperless first. They will open a ticket here if required. 2) Ok
Does this mean that empty pages have to be exorcised from PDFs? That's a problem with legal documents.
No, it means the exception has to be catched and handled correctly by the calling program. I can make it a more specific exception if it can help @danielquinn .
Okay, thanks - I'll close this.
Hm actually, I'll keep this ticket open for now, because there are two things I must do:
Doc updated. I actually think the specific exception is not required.
Hi!
I'm encountering this error with some of my PDFs: