Closed peix2 closed 6 years ago
Just install the language file for your needs
apt-cache search tesseract
apt-get install tesseract-deu
Then you can select the language before OCRing the file
Hi Sanookmakmak,
I have no such menu. My guess this comes with OCR app for nextcloud which I haven't install. I'm using nextant app only and enabled indexing through images as well. So in general nextant admin settings I can not set language and on the server process list, while indexing, I see tesseract is run with "-l eng". How I can change this parameter to anything else if no such setting? Do you know where in code (if possible) I can search and change it?
And finally could you check and confirm what app this menu, shown by you, comes with? Does it impact the way tesseract is run for nextant as well?
Cheers
Px2
W dniu 24.01.2017 o 19:11, Sanookmakmak pisze:
Just install the language file for your needs
|apt-cache search tesseract apt-get install tesseract-deu |
Then you can select the language before OCRing the file
9a3a5d95c1859bce0b2717b7b0ad3b708529f0f5 https://cloud.githubusercontent.com/assets/24833757/22236179/45dc1040-e204-11e6-8f5a-7756e9536c76.png
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextcloud/nextant/issues/131#issuecomment-274718732, or mute the thread https://github.com/notifications/unsubscribe-auth/AYI4wUG1oBwMdgfYxUXWvf5haO0KDjM9ks5rVZYbgaJpZM4LrtFY.
Of course you are right, this menu belongs to the OCR app ;-)
I did a
grep -Ri tesseract /opt/solr
and it found
tika-parsers-1.13.jar
Inside the jar file is the file TesseractOCRConfig.properties
with the content
tesseractPath=
language=eng
pageSegMode=1
maxFileSizeToOcr=2147483647
minFileSizeToOcr=0
timeout=120
I reckon this is what you are looking for.
You're the man.
Thanks!
W dniu 25.01.2017 o 11:43, Sanookmakmak pisze:
Of course you are right, this menu belongs to the OCR app ;-)
I did a
|grep -Ri tesseract /opt/solr|
and it found
|tika-parsers-1.13.jar|
Inside the jar file is the file |TesseractOCRConfig.properties| with the content
|tesseractPath= language=eng pageSegMode=1 maxFileSizeToOcr=2147483647 minFileSizeToOcr=0 timeout=120 |
I reckon this is what you are looking for.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextcloud/nextant/issues/131#issuecomment-274963398, or mute the thread https://github.com/notifications/unsubscribe-auth/AYI4wcVq7-ghKpasuXVBJEZwVcmzCtv9ks5rVn6KgaJpZM4LrtFY.
Add this to the wiki, or find a way to integrate the language selection into Nextant
Add this to the wiki
https://github.com/nextcloud/nextant/wiki/And-some-more-...#change-ocr-language
Hi!
My file has tesseractPath=
empty, should i set it up to
tesseractPath=/usr/bin/tesseract
The TikaOCR doesn't say much about it.
from what it seems, Tika will get tesseract if installed at its usual place.
Please use Full text search instead of Nextant.
Each time I see tesseract running it has -l eng only. Is there any easy way to change it or use all available? Would be good to make a setting for that.