Closed stweil closed 2 months ago
@mittagessen, thank you for the hint to this script. It works pretty good and is really fast.
Is it also possible to extract the line images without a black background? I am not sure whether line images which only contain the original image inside of the polygon are good for Tesseract training.
I tried --legacy-polygons
, but it looks like that code no longer works (it aborts with an exception).
Is it also possible to extract the line images without a black background? I am not sure whether line images which only contain the original image inside of the polygon are good for Tesseract training.
Hmm, not really the extract_polygons()
function the script calls just masks it out and you'd need to change that one to not apply the mask. But I'm not sure of how much use the extracted lines are for Tesseract training anyway as the baseline projection the line extractor does is obviously not available in Tesseract's bbox data model.
Instead of
it produces this log output: