openpaperwork / paperwork

Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/paperwork
2.43k stars 149 forks source link

Cannot select OCR language other than English #735

Open Nadrazhul opened 6 years ago

Nadrazhul commented 6 years ago

I had an issue similar to #107 but with a twist: I'm running Ubuntu 16.04 LTS and as per recommendations, I have successfully installed paperwork via flatpak. So it now lives in ~/.local/share/flatpak/app/work.openpaper.Paperwork Also, I have installed an additional OCR language (German) via sudo apt-get install tesseract-ocr tesseract-ocr-deu and I have confirmed that deu.traineddata has been installed to/usr/share/tesseract-ocr/tessdata/

However, when I now open paperwork/Settings, I can only select English as OCR language?

What is the correct way to install additional OCR languages in this scenario? I have tried to cp deu.traineddata from the main share directory to .local/share/flatpak/app/work.openpaper.Paperwork/current/active/files/share/tessdata/ and now I can select German as OCR language and German language recognition seems to work now.

Is this all there is to it? If so, could you maybe update the OCR language entry of the FAQ accordingly to inform other flatpak users?

jflesch commented 6 years ago

sudo apt-get install tesseract-ocr tesseract-ocr-deu

If you installed Paperwork using Flatpak, apt is of no use. Paperwork run in its own container. Flatpak should have installed automatically the Tesseract file data for your language (based on your system locale).

jflesch commented 6 years ago

what you can try:

flatpak run --command=bash work.openpaper.Paperwork
find /app/share/runtime/locale -type f

It should show a deu.traineddata.

Also the diagnostic output could help again.

jflesch commented 6 years ago

ping ?

Nadrazhul commented 6 years ago

Apologies, I had a mind to configure ecryptfs to protect the paperwork documents and all the other home folder contents. This led into another rabbit hole of issues, unrelated to paperwork, which I haven´t fully resolved yet.

Of course, you are right, it makes perfect sense for paperwork to install default OCR language based on your system locale. My default locale is en_US.UTF-8, by the way, so again defaulting to eng.traineddata makes perfect sense. So I would rather consider adding additional OCR languages in flatpak installation to be a documentation improvement/addition to the FAQ than a program issue.

Anyway, running commands above does not show any languages, only: /app/share/runtime/locale/.ref

And here is the diagnostic output.... issue735diag.log

jflesch commented 6 years ago

Ok. Just beware, because of #744 , the path for the tessdata will change at some point later (probably be placed somewhere in /home I guess).

kafran commented 6 years ago

If you run $ flatpak list --all the _work.openpaper.Paperwork.Locale/x8664/master package is only partially installed.

To fix this, you should reinstall the locale package. This will download the whole language support:

$ flatpak --user install --reinstall work.openpaper.Paperwork-origin work.openpaper.Paperwork.Locale//master