Closed bertsky closed 4 years ago
@bertsky We can set more tight textlines but this also has its own disadvantages. By the way we will publish a new tool which throws contours for textlines not rectangles. however mentioned method costs us more processing time!
@vahidrezanezhad
We can set more tight textlines but this also has its own disadvantages. however mentioned method costs us more processing time!
Then why not make that behaviour optional (with an ocrd-tool.json parameter), so the user can decide what is needed (precision or performance) for her workflow?
By the way we will publish a new tool which throws contours for textlines not rectangles.
Where?
And why did you close the issue already?
Dear @bertsky , First of all you can see the tool which gives texlines as contour here " https://github.com/vahidrezanezhad/newspapers_regions_and_reading_order_curved_lines " But the reason it is not integrated as an option to the current model is that, the new tool will be another tool which can give also the reading order of textregions. The other reason is it is still under development. If you use this tool (of course I can share the models with you :) ) you will see that I am writing textlines contours on the deskewed image and not original image, but based on our internal decisions in sbb we decided to write results on org image again.
@vahidrezanezhad understood – I'll try to follow. Thanks for clarifying!
Would it be possible to get good polygonal outlines from the text line segmentation instead of coarse bounding boxes?
There is a stark contrast between the precise contours of the text regions (which never overlap) and the coarse rectangles of text lines inside them (which often extrude beyond their parent and overlap between adjacent lines).
This makes it risky to apply line-level dewarping afterwards, and requires an OCR engine that can cope with intruders in the line image. In the example given in #29, I get these line images from
ocrd-cis-ocropy-dewarp
: