tesseract-ocr / langdata_lstm

Data used for LSTM model training
Apache License 2.0
114 stars 152 forks source link

Missing support for Coptic script #36

Open stweil opened 4 years ago

stweil commented 4 years ago

Training of Tesseract with tesstrain and a text containing ϯ creates a unicharset file which includes this line:

ϯ 3 0,255,0,255,0,0,0,0,0,0 Coptic 273 0 273 ϯ  # ϯ [3ef ]a

lstmtrain complains about a missing file:

Failed to load script unicharset from:data/Coptic.unicharset
stweil commented 4 years ago

See also https://github.com/Shreeshrii/tessdata_coptic.