Open Stond0cyborg opened 8 months ago
The corresponding training data is available at https://github.com/tesseract-ocr/langdata_lstm/tree/main/deu_latf For the basic meaning of the files, see https://groups.google.com/g/tesseract-ocr/c/U9mysQuhRpU/m/7aNrZACXBQAJ for example.
Don't use deu_latf for Fraktur. Try https://zenodo.org/records/10125246 instead.
More models here: https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/. My latest models for historic texts are called "german_print*".
See also https://ocr-bw.bib.uni-mannheim.de/faq/ (German).
Dann bedanke ich mich recht herzlich, Herr Weil und wünsche weiterhin viel Erfolg mit ihrem Programm! ;)
I tried to recognize some old Fraktur texts with deu_latf, but there are many words that are not recognized correctly, so I extracted the word list from deu_latf. This file seems to use word recognition Example: A-{d}-{cd°s}%- A-{d}-{cd°a}% A-{d}-{c-% A-{d}s§gi I then extracted the readable version and realized that a lot more words (recognitions) could be added. I would also like to try to improve the problem with the recognition of "ich, schon ,noch" etc. to improve it. Because, with "bat ned) " (hat noch) "bod)" (doch) you can not do much.
Is there a README file for this file or another explanation to extend it?