the-paperless-project / paperless

Scan, index, and archive all of your paper documents
GNU General Public License v3.0
7.84k stars 501 forks source link

QNAP Docker Setting to detect only eng? #692

Open tanderson1992 opened 3 years ago

tanderson1992 commented 3 years ago

I setup paperless with the docker instructions. After install it worked fine on a few PDFs until I got to my vehicle registration. The document is entirely in English, but it seems to be detecting it as cat/ca which is not installed. Is there a setting to force the software to use only English, or just skip OCR instead of failing to process? I see this in the 0.3.3 changelog but don't see where to set the default language. "Timezone, items per page, and default language are now all configurable..." I have "PAPERLESS_OCR_LANGUAGES=" [set to blank] in the yml file used to install paperless.

Here's a snippet of the error. I can work on full logs if that would help, but I think the issue is it's somehow detecting another language and trying to ocr in that language even though I've specified not to ocr in any language other than English.

Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm                                                                                                                                                                                                                                             
[pgm_pipe @ 0x558c05dd90c0] Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                                                          
[image2 @ 0x558c05ddac40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x558c05ddac40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
OCRing the document                                                                                                                                                                                                                                                                                                                                                               
Parsing for eng                                                                                                                                                                                                                                                                                                                                                                   
Parsing for cat                                                                                                                                                                                                                                                                                                                                                                   
Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.unpaper.pnm                                                                                                                                                                                                                             
Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm                                                                                                                                                                                                                                             
[pgm_pipe @ 0x55dd25c170c0] [pgm_pipe @ 0x55ccf30aa0c0] Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                              
Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                                                                                      
[image2 @ 0x55dd25c18c40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x55dd25c18c40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
[image2 @ 0x55ccf30abc40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x55ccf30abc40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
OCRing the document                                                                                                                                                                                                                                                                                                                                                               
Parsing for eng                                                                                                                                                                                                                                                                                                                                                                   
Parsing for cat                                                                                                                                                                                                                                                                                                                                                                   
PARSE FAILURE for /consume/Registration.pdf: The guessed language (ca) is not available in this instance of Tesseract.