tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
620 stars 181 forks source link

tesstrain.py cleanup #237

Closed Shreeshrii closed 2 years ago

Shreeshrii commented 3 years ago

Remove commented out code for legacy engine as well as code not used for lstm training

lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging 7a49e3e4e748ffb89b66b9c5f3b6734f27f866bc into 0d972f86f4aaf88fde77e3445ff607e68866c882 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging 08b7031daf58b49f94af7cc0f93f5720ef59dcce into 0e8151472ca034ee3366682d6829802ee1d9455e - view on LGTM.com

new alerts:

bertsky commented 3 years ago

I am not sure whether removing the legacy training code is a good idea.

Maybe it would be better to maintain that part as long as the legacy recognition is useful.

Good point! Keeping this alive does not cost much, but offers many options (that other engines don't). One scenario is retraining the osd.traineddata to include more scripts and improve its discriminative ability (esp. regarding Greek which I find very weak).