tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.92k stars 9.47k forks source link

box.train progress fell into silence but didn't exit #2688

Open MORzyuan opened 5 years ago

MORzyuan commented 5 years ago

Environment 1

Environment 2

The case reported below have been tested under both of these two enviroments.

Current Behavior:

Training progress progress fell into silence but didn't exit. tesseract allhz.NewspaperSung.exp0.tif allhz.NewspaperSung.exp0 box.train image The file commonhz.NespaperSung.exp0.tif was generated by the following command text2image --text=training_all.txt --outputbase=allhz.NewspaperSung.exp0 --fonts_dir=../font/ --font='Old Newspapers Sung' --writing_mode vertical --xsize=1600 --ysize=2000 --resolution=300 where training_all.txt is the random combination of Han characters, shown as the attached file training_all.txt, and the font file can be downloaded here: http://js.xiazaicc.com/down1/mgbzzt_downcc.zip

The case happened 3 times as the same page 404(404 is not a lovely code!! TAT), it seems not a coincidence.

Best!

amitdo commented 5 years ago

If you train for the lstm engine, you should use lstm.train.

MORzyuan commented 5 years ago

If you train for the lstm engine, you should use lstm.train.

I am not aimed to train lstm engine. And also for the 3.05 version, just as the Environment 2 says, this problem still exists.