Open honeytidy opened 7 years ago
Please see this note in wiki - https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#error-messages-from-training
No block overlapping textline: occurs when layout analysis fails to correctly segment the image that was given as training data. The textline is dropped. Not much problem if there aren't many, but if there are a lot, there is probably something wrong with the training text or rendering process.
Still getting some of these errors for Devanagari, with tif/box pairs generated by text2image. Seems to be around ---------०---------
in training text.
No block overlapping textline: ---------०---------
No block overlapping textline: वित्त्येवहि अचूर्यामहि कृतघ्नं शत्रून्द्रुहे शुष्कीकरोति
No block overlapping textline: अर्कैः
No block overlapping textline: ह्यन्बन्त्यांञ्जगृहीतवती शक्तिपीठं ग्न्य छन्दष्ट्य झ
I want generate some training data for tesseract:
tesseract tiff/data.tif testdata/data lstm.train langdata/chi_sim/chi_sim.config
But I got a lots of the following message, it happens almost on loading every page:My tesseract version: tesseract 4.00.00alpha leptonica-1.74.4 Any sugguestion for this?