tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
625 stars 180 forks source link

Image pre-processing for Neural Network #152

Closed mittpy closed 4 years ago

mittpy commented 4 years ago

Hi, folks,

Your opinion on the following matters would be of great help for me:

  1. Tesseract 4.00 comes with neural network for single-line images. Should all line images in the training and testing detasets be of the same width and height?
  2. When 'fine-tuning', it is widely recommended that the pre-processing should be the same as those used for the training of the existing language chosen for 'fine-tunung', for example, eng.traineddata. Would you suggest that I should adhere to these recommendations when training Tesseract and if 'yes', what is the shape of the original image arrays used for training eng.traineddata, are they grayscale or binary?

You have my appreciation in advance! Best Regards!

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.