tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
625 stars 180 forks source link

Word started with a combiner:0x200c , Normalization failed for string #158

Closed sam-kurdi closed 4 years ago

sam-kurdi commented 4 years ago

Training workflow and configuration mentioned in Issue

modifying generate_wordstr_box.py file as suggested in issue

Still, the message appears in the log

Bad box coordinates in boxfile string! Extracting unicharset from plain text file data/krd/all-gt Word started with a combiner:0x200c Normalization failed for string 'له دادگای باریی که‌سی () ده‌رچووه له‌سه‌ر ماره‌یی پێشه‌کی ( ٤٥چل و پێنج مسقال زێر) وه‌ پاشه‌کیداوالێکراوی سه‌ره‌وه مێردی بریکارده‌رم

any suggestions to solve this error?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.