Open Furtifk opened 1 year ago
Are there any plans to add it?
The best/fast models were uploaded 5 years ago. AFAIK, no one is working on updating them.
Thanks for the information and the fast reply. Would you know of any fix I could have access to OCR this character?
Many thanks ahead of time ^^
The official script/Latin
model includes ±
. You could also try any of my models from https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/, for example https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021_09/tessdata_fast/frak2021-09.traineddata.
The official
script/Latin
model includes±
. You could also try any of my models from https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/, for example https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021_09/tessdata_fast/frak2021-09.traineddata.
Thanks a lot. I will try this and let you know here if it does indeed work for us going forward.
After further testing, it would appear both lat.traineddata (https://tesseract-ocr.github.io/tessdoc/Data-Files) and your own model are struggling to get this char in my example. Is this the latin dictionary file you meant as I have linked above? If not, where could I find this and download to try it?
Many thanks!
lat.traineddata
is a different model. script/Latin
is in https://github.com/tesseract-ocr/tessdata_fast/tree/main/script. Or simply re-run the installer and select it there for installation.
Thanks for the link. I have tried this on my end with the Latin.traineddata model but I'm still not having much luck with the test file and internal files on my end for getting this character. I'm guessing there's not much else that can be done here? Thanks for the help and suggestions nonetheless.
English traineddata file does not contain the '±' character?
Environment Tesseract Version: 5.00 Downloaded from: https://github.com/UB-Mannheim/tesseract/wiki Platform: Windows 10 64bit
I am trying to OCR using the English dictionary file found: https://tesseract-ocr.github.io/tessdoc/Data-Files I notice the character is not included here either: https://github.com/tesseract-ocr/langdata_lstm/blob/main/eng/eng.unicharset
Are there any plans to add it? Are there any language files that contain successfully OCR this character?
Many thanks to whoever can assist here. I am attaching the file I used to test this behavior for this character here: (https://github.com/tesseract-ocr/langdata_lstm/files/9870674/Special.Symbols.pdf)