Open bertsky opened 4 years ago
Sorry for not responding to this. This came up again because @stweil fixed Python 3.10 compatibility in #13.
@bertsky Do you use this or is this "just" interest in overall quality of OCR-D tools?
The latter. I never found a use-case for myself. Messy line orderings are not rare, but they do not seem to come with correct region text.
Also, since, with https://github.com/bertsky/nmalign I wrote a general-purpose tool for (purely textual) forced alignment.
I imagine this can fail in many ways. Do you have good example data? Or rather, create them artificially by re-ordering segments in good GT ad-hoc?
As for negative tests, we could probably use
kant_aufklaerung_1784
from OCR-D/assets because of its bad tokenization, plus some bags/filegrps without text or with missing text.