Closed AyushP123 closed 5 years ago
See https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR#tesseract-fails-to-create-lstm-files for a possible reason and solution. That article also lists other potential problems.
Thanks for your response @stweil. It really helped. Using psm 7 was the issue, with psm 13 all empty string outputs got eliminated. I think that's a bug with Tesseract.
We still need a test how using psm 13 instead of 7 changes the training results. If there are no negative effects, the default psm should be changed to 13.
Tesseract Version: 4.1.0
I am trying to fine tune tesseract on custom dataset with the following Makefile:
The number of .lstmf files being generated is significantly lower than .box files being generated. For eg: Number of .tif files: 10k Number of .gt.txt files: 10k Number of .box files: 10k Number of .lstmf files: 8k.
Could anyone point me out to the possible reasons for this issue.