tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
834 stars 888 forks source link

[Feature request] font list for LSTM #86

Closed amitdo closed 6 years ago

amitdo commented 7 years ago

@theraysmith, please add the names of the fonts you trained for each language/script to the langdata repo.

roozgar commented 7 years ago

maybe a repository for upload font is useful yiu can add some famous fonts of each language to tesseract...

Shreeshrii commented 6 years ago

Duplicate https://github.com/tesseract-ocr/langdata/issues/98

amitdo commented 6 years ago

Duplicate #98

It's vice versa... :-)

amitdo commented 6 years ago

Shree, It's OK! :-)

stweil commented 6 years ago

Now both issues are closed, but the initial request is still open. I suggest to re-open #86 and assign it to @theraysmith.

amitdo commented 6 years ago

Now both issues are closed,

Someone is confused...

zdenop reopened this a day ago

:smile:

amitdo commented 6 years ago

Ray provided the font list for Hebrew in https://github.com/tesseract-ocr/langdata/issues/82#issuecomment-320100717 https://github.com/tesseract-ocr/langdata/files/1198659/hebrewfonts.txt

Shreeshrii commented 6 years ago

Another list of fonts, without language wise breakup can be seen in

https://github.com/tesseract-ocr/langdata/blob/master/font_properties

Shreeshrii commented 6 years ago

The training scripts, used by tesstrain.sh also have a list of fonts, sorted by scripts/languages.

https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh

However, both these links do not necessarily reflect the exact set of fonts used for LSTM training.

On 27-Feb-2018 11:21 PM, "ShreeDevi Kumar" shreeshrii@gmail.com wrote:

Another list of fonts, without language wise breakup can be seen in

https://github.com/tesseract-ocr/langdata/blob/master/font_properties

stweil commented 6 years ago

Large parts of language-specific.sh are still from 3.05 – I see no new fonts for LSTM there. For font_properties the situation is similar. So both files might include fonts used for LSTM, but we can only guess.

Shreeshrii commented 6 years ago

@stweil You are correct. I just wanted to link all the available list of fonts in one place. We will not know the fonts used by Ray for LSTM till he provides new version of these.

amitdo commented 6 years ago

@jbreiden,

Do you have access to the font lists? If you do, any chance you can upload them to this repo?

jbreiden commented 6 years ago

https://github.com/tesseract-ocr/langdata_lstm/blob/master/font_properties

jbreiden commented 6 years ago

Maybe I'm missing something. I'll try to hunt down Ray and find out.

amitdo commented 6 years ago

https://github.com/tesseract-ocr/langdata/issues/86#issuecomment-368961594tps://github.com/tesseract-ocr/langdata/issues/86#issuecomment-368961594

Ray provided the font list for Hebrew in #82 (comment) https://github.com/tesseract-ocr/langdata/files/1198659/hebrewfonts.txt

Try to find it in the Google (langdata?) repo if you still have access to it.

jbreiden commented 6 years ago

I talked to Ray, tracked them down, and uploaded to Github. Sorry for not having these earlier.

https://github.com/tesseract-ocr/langdata_lstm/blob/master/heb/okfonts.txt

amitdo commented 6 years ago

Thank you!