qurator-spk / sbb_textline_detection

Detect textlines in document images
Apache License 2.0
88 stars 18 forks source link

Consider restricting detected lines to detected regions #19

Closed wrznr closed 4 years ago

wrznr commented 4 years ago

The SBB page segmentation creates to levels of representation: 1. a segmentation of the page into regions and 2. a segmentation of the page into lines. Currently, those two levels are obviously independent of each other. That is, a region may be smaller than the union of the lines it incorporates: image

vahidrezanezhad commented 4 years ago

Dear @wrznr, I have no idea whether is that a problem or not. But logic behind this is that sometimes my layout detection is not that good (specially text regions) however I can implement a robust textline detection. If I want to restrict my textline rectangles to text region(masking by text region) it can worse the result of textline detection.

mikegerber commented 4 years ago

(@vahidrezanezhad: Please correct me if I'm not 100% correct with this)

As I understand it, the regions you see here are not much more than the raw pixelwise segmentation results, while the textlines are the result of pixelwise textline segmentation + heuristics + taking textregions into account + deskewing. The regions still contain (in the logical, XML sense) their textlines because we need that for ordering, but no effort has been made to process their bounding polygons. The textlines are processed to give a useful and nice bounding polygon/rectangle because we need this for OCR.