I'm trying to train tesseract to handle custom font/color scheme. eng.traineddata works pretty well for me out of the box but misses a few crucial cases that I need to parse correctly, so I prepared my training data with png and txt files. When I run "make training MODEL_NAME=light-model START_MODEL=eng TESSDATA=eng_tessdata_best PSM=7 MAX_ITERATIONS=20000" i see it start with a surprisingly high error rate (100%) considering eng.traineddata works very well for me.
I think I've pinpointed the problem to be a discrepency between tesseract output and lstmeval output. When I go into my tesseract repo and run " tesseract test.png test_output -psm 7" I get it to parse properly, but when I use eng.traineddata and lstmeval on that same image, I get a nonsensical output string. What am I doing wrong? It seems like lstm training has no correlation with the output I'm seeing from tesseract
I'm trying to train tesseract to handle custom font/color scheme. eng.traineddata works pretty well for me out of the box but misses a few crucial cases that I need to parse correctly, so I prepared my training data with png and txt files. When I run "make training MODEL_NAME=light-model START_MODEL=eng TESSDATA=eng_tessdata_best PSM=7 MAX_ITERATIONS=20000" i see it start with a surprisingly high error rate (100%) considering eng.traineddata works very well for me. I think I've pinpointed the problem to be a discrepency between tesseract output and lstmeval output. When I go into my tesseract repo and run " tesseract test.png test_output -psm 7" I get it to parse properly, but when I use eng.traineddata and lstmeval on that same image, I get a nonsensical output string. What am I doing wrong? It seems like lstm training has no correlation with the output I'm seeing from tesseract