mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
724 stars 130 forks source link

How do overlapping regions affect region segmentation training? #551

Closed rohanchn closed 10 months ago

rohanchn commented 11 months ago

I have several pages where illustrations disrupt the continuity of the main/primary text.

While annotating such pages, if I create overlapping regions — one large region for the main text and another region that overlaps with the MainText for the illustration (like in the example below) — how will this affect training for segmentation?

illustration

dstoekl commented 11 months ago

Regions of different types can overlap with no problem in my experience. You could also use the API to diminish the main region by the illustration as long as you do not create a hole inside a polygon.

rohanchn commented 11 months ago

You could also use the API to diminish the main region by the illustration as long as you do not create a hole inside a polygon.

Ahh! Sounds like something I should try. Thank you!

I remember seeing documentation on the API some time ago. Is this [0] how we do it? Any more reference anywhere?

[0] https://pypi.org/project/escriptorium-connector/