Closed amitdo closed 6 years ago
maybe a repository for upload font is useful yiu can add some famous fonts of each language to tesseract...
Duplicate #98
It's vice versa... :-)
Shree, It's OK! :-)
Now both issues are closed, but the initial request is still open. I suggest to re-open #86 and assign it to @theraysmith.
Now both issues are closed,
Someone is confused...
zdenop reopened this a day ago
:smile:
Ray provided the font list for Hebrew in https://github.com/tesseract-ocr/langdata/issues/82#issuecomment-320100717 https://github.com/tesseract-ocr/langdata/files/1198659/hebrewfonts.txt
Another list of fonts, without language wise breakup can be seen in
https://github.com/tesseract-ocr/langdata/blob/master/font_properties
The training scripts, used by tesstrain.sh also have a list of fonts, sorted by scripts/languages.
https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh
However, both these links do not necessarily reflect the exact set of fonts used for LSTM training.
On 27-Feb-2018 11:21 PM, "ShreeDevi Kumar" shreeshrii@gmail.com wrote:
Another list of fonts, without language wise breakup can be seen in
https://github.com/tesseract-ocr/langdata/blob/master/font_properties
Large parts of language-specific.sh
are still from 3.05 – I see no new fonts for LSTM there. For font_properties
the situation is similar. So both files might include fonts used for LSTM, but we can only guess.
@stweil You are correct. I just wanted to link all the available list of fonts in one place. We will not know the fonts used by Ray for LSTM till he provides new version of these.
@jbreiden,
Do you have access to the font lists? If you do, any chance you can upload them to this repo?
Maybe I'm missing something. I'll try to hunt down Ray and find out.
Ray provided the font list for Hebrew in #82 (comment) https://github.com/tesseract-ocr/langdata/files/1198659/hebrewfonts.txt
Try to find it in the Google (langdata?) repo if you still have access to it.
I talked to Ray, tracked them down, and uploaded to Github. Sorry for not having these earlier.
https://github.com/tesseract-ocr/langdata_lstm/blob/master/heb/okfonts.txt
Thank you!
@theraysmith, please add the names of the fonts you trained for each language/script to the langdata repo.