manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.61k stars 189 forks source link

spellcheck dictionary for Deutsch (Fraktur) #328

Closed Scannen100 closed 6 years ago

Scannen100 commented 6 years ago

First of all a big thank you to you and the guys of Tesseract!

It seems to work well even without the spellcheck dictionary. But it will probably work even better with the dictionary installed.

I have gImageReader 3.2.99, Win 7 64. English and Deutsch [not Fraktur] work without any problems. gImageReader detects the proper dictionary files. I have put the dictionary file for Deutsch (Fraktur) into the same folder, but gImageReader keeps telling me: The spellcheck dictionary for Deutsch (Fraktur) is not installed

But I think I have the correct files: de_DE_OLDSPELL.dic and de_DE_OLDSPELL.aff It seems to have worked in the past; you can look here: (scroll down to the last picture, please!) https://www.lwl.org/waa-download/pdf/Installation%20OCR%20Software.pdf What can I do? Thanks in advance! :)

manisandro commented 6 years ago

I'll need to investigate, it could be due to some change in how enchant-2.x (the underlying spellchecking library) parses the dictionary file names (and detects the language codes from those).

Scannen100 commented 6 years ago

Thanks! I’m looking forward to hearing from you again :)

manisandro commented 6 years ago

@Scannen100 Does one of the following builds help (latter one using tesseract 4.x beta, depending on which version you are using currently you might need to re-install all traineddata files, and note that tesseract-4.0 does not have deu_frak anymore, but rather a generic Fraktur skript traineddata):

https://smani.fedorapeople.org/tmp/gImageReader_3.2.99_qt5_x86_64.exe https://smani.fedorapeople.org/tmp/gImageReader_3.2.99_qt5_x86_64_tesseract4.0.0beta1.exe

Scannen100 commented 6 years ago

Thank you for your posting! I have found a workaround.

I tried the two builds above. The one I had before was gImageReader_3.2.99_qt5_x86_64_tesseract4_gitbb89dc3.exe. None worked. But when I put the de_DE_OLDSPELL files into another folder, all was suddenly OK. Original folder: C:\Users\Scannen 100\AppData\Local\enchant\myspell New folder: C:\Users\Scannen 100\AppData\Local\gImageReader\share\myspell

Note: I had to search for (and download) the de_DE_OLDSPELL files on the net and had to install them manually (i. e. had to put them in the folder).

I hope that I haven’t made any mistakes and that my description is correct and easy to understand. I’m no expert.

Thanks again :)

manisandro commented 6 years ago

Ah, so could be that enchant2 is looking for dictionaries in a separate folder. I'll need to check.

manisandro commented 6 years ago

So handling should finally be correct now, please try

https://smani.fedorapeople.org/tmp/gImageReader_3.2.99_qt5_i686_tesseract4.git87635c1.exe https://smani.fedorapeople.org/tmp/gImageReader_3.2.99_qt5_x86_64_tesseract4.git87635c1.exe