Open trailingslash opened 2 years ago
Output of tesseract --list-langs
List of available languages (11):
chi_sim
chi_sim_vert
chi_tra
chi_tra_vert
deu
eng
fra
ita
nor
osd
spa
Update: I've included more langs in the "PAPERLESS_OCR_LANGUAGE=nor+eng"
value in docker-compose.env
It seems to recognize both Norwegian and Chinese now. However, the OCR quality of the Chinese books are unfathomably bad, it injects capitalized Latin letters where it should be Chinese characters.
My question is now - is there any way to crank up the OCR quality? It doesn't really matter to me if it takes a day to scan a single book, as long as the OCR is reasonably on point.
Meet same issue, the Chinese language text is almost can not recognize.
Hi,
I'm running paperless-ng in Docker on an
amd64
Ubuntu server.When I add a document through the WebUI, it processes from some time without any errors in the logs, reports the document is ready, and is OCR'd in the wrong language. The first document I tried was in Norwegian, the second was in Chinese and English.
Paperless-ng only OCR'd in English in both cases - any Norwegian and Chinese letters/characters was in an English OCR output.
Logs at the bottom.
# This is my docker-compose.env
# And this is my docker-compose.yml
# Logs