tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

What if my ground truth includes characters not found in a *.unicharset? #371

Closed yaofuzhou closed 3 months ago

yaofuzhou commented 4 months ago

Question - Before make training, what if my ground truth includes characters not found in any of the most relevant *.unicharset file? Are the new characters automatically being updated to a *.unicharset? If so, how do I specify which *.unicharset? If not, what information do I need to manually add to a *.unicharset and how do I know that the training process utilize the correct *.unicharset? Thanks!