ulb-sachsen-anhalt / digital-eval

Evaluate data from mass digitalization workflows
MIT License
5 stars 1 forks source link

Extract groundtruth frame #8

Closed M3ssman closed 1 year ago

M3ssman commented 1 year ago

Description

Integrate ability to extract certain areas from a given groundtruth data set (past OCR-functionalities).

Based on finest OCR-structs decide what must be included into the area-of-interest. Usually, as with newspapers, this means a specific column or article on a whole page.

Using the new pieces API, also to include the bottom-up struct repairs, i.e. modify lines only to fit enclosed words as well as modify regions to fit to probably changed lines.

Integration into model-module.