janis91 / ocr

Nextcloud OCR (optical character recoginition) processing for images with tesseract-js
GNU Affero General Public License v3.0
107 stars 17 forks source link

Relax test for allowed languages #138

Closed stweil closed 6 years ago

stweil commented 6 years ago

The app is currently very restrictive regarding the allowed names for installed languages in the settings page. Thus administrators cannot install their own German.tessdata German.traineddata or use Tesseract 4 with the provided Latin.tessdata Latin.traineddata.

It should relax the test and allow any "reasonable" name.

janis91 commented 6 years ago

Do you mean that a administrator should be able to add language names different then "deu, fra, deu-frak" into the settings input field for example? @stweil

janis91 commented 6 years ago

Is "tessdata" the same as "traineddata" ?

stweil commented 6 years ago

Yes and yes. The settings field should allow entries like "Fraktur", "German", "Latin" or even "Canadian_Aboriginal", too (see list at https://github.com/tesseract-ocr/tessdata_fast).

And of course I should have written German.traineddata. That was a confusion on my side, caused by the directory name which includes those traineddata files. I fixed my previous comment now.

janis91 commented 6 years ago

Ok. That sounds very reasonable to me. I will relax the regex check.

stweil commented 6 years ago

Danke!

janis91 commented 6 years ago

fixed in #139

stweil commented 6 years ago

It's still not possible to enter Latin without getting an error message. In addition, I changed tessdata_fast today. Now for script data it looks like script/Latin, so slashes are also possible and should be supported.

janis91 commented 6 years ago

Now it should be fixed.. forgot the server side part :-D

stweil commented 6 years ago

I'm sorry that I don't have good news: the new language and script names still don't work in my installation. It works now, thank you.