tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
637 stars 188 forks source link

Box files contain the same box for all characters #25

Closed RanAR90 closed 6 years ago

RanAR90 commented 6 years ago

Hello guys

thanks for providing this tool. It eases one's life when using images instead of generating synthetic data for specific fonts. I have been using it and thought that I have got it right! but when I checked the boxes files manually I found out that the box is the same for every character in the line segment and the final line segment is only 2*2 in width and height which is not the case!

Can you please guide me what I have done wrong!

I have only cloned the repo and i had tesseract and leptonica already built and installed and used the make training MODEL_NAME =mine

is this the correct box format to expect ? or am i missing something here?

P.S The Model is training fine and I am getting a final .traineddata and everything looked fine until I have checked this one because I was not very satisfied with the results. Regards

wrznr commented 6 years ago
  1. It is okay and intended that all characters have the same box. The coordinates for every single character correspond to those of the whole line.
  2. It is also okay and intended that the final line segment is only 2*2. Consider it a dummy end-of-line element which does not correspond to an existing char.

It is also not very surprising that your are not satisfied with the results (given our small training sample). What is your evaluation scenario?

wrznr commented 6 years ago

@RaniemAR Can we consider your problem solved?

RanAR90 commented 6 years ago

@wrznr Thanks for your answers, I also thought it is an expected behaviour but just wanted to confirm.

Thanks a bunch for the answer and the support and sorry for the delay replying Yes the problem is solved. Thanks for sharing your brilliant work, it makes people life easier.

Regards