tesseract-ocr / test

Repository for tesseract testing
Apache License 2.0
30 stars 30 forks source link

Upload training data and font needed by various unittests #14

Closed Shreeshrii closed 5 years ago

Shreeshrii commented 5 years ago

Attempt to recreate training data for lstm tests using following commands:

src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only   --noextract_font_properties --langdata_dir ../langdata_lstm   --tessdata_dir ../tessdata --output_dir ~/test/testdata --fontlist "Arial" --maxpages 10

src/training/tesstrain.sh --fonts_dir ../.fonts --lang kor --linedata_only   --noextract_font_properties --langdata_dir ../langdata_lstm   --tessdata_dir ../tessdata --output_dir ~/test/testdata --fontlist "Arial Unicode MS" --maxpages 10
stweil commented 5 years ago

I think that it might help to have some documentation how the files were generated or where they come from, maybe in testdata/README.md.

Shreeshrii commented 5 years ago

Thanks for the suggestion, I have added the info and also deleted files that were generated by training but were not needed by unittests.

Shreeshrii commented 5 years ago

@stweil Have uploaded font in response to https://github.com/tesseract-ocr/tesseract/pull/2184