ulb-sachsen-anhalt / ocrd-odem

OCR Workflows based on OCR-D
MIT License
3 stars 1 forks source link

Improve image statistics: total images vs. used images #1

Closed M3ssman closed 10 months ago

M3ssman commented 10 months ago

Description

Currently, the filter logic at filter_images doesn't respect the case, when a print contains no structures-of-interest for ocr.
For example, when the blacklisted structures and label lead to the result, that a print is only a map of images with no reasonable pages to pass to the OCR-D workflow. Instead of going on, the implementation must signal this constellation with a dedicated exception and immediately stop processing.

Came up with record "Historia longa veritatis, mille annorum..." which is literally a rather small collection of illustrations in between a book case.