qurator-spk / ocrd_repair_inconsistencies

Automatically re-order lines, words and glyphs to become textually consistent with their parents.
Apache License 2.0
2 stars 3 forks source link

add tests and CI #7

Open bertsky opened 4 years ago

bertsky commented 4 years ago

I imagine this can fail in many ways. Do you have good example data? Or rather, create them artificially by re-ordering segments in good GT ad-hoc?

As for negative tests, we could probably use kant_aufklaerung_1784 from OCR-D/assets because of its bad tokenization, plus some bags/filegrps without text or with missing text.

mikegerber commented 1 year ago

Sorry for not responding to this. This came up again because @stweil fixed Python 3.10 compatibility in #13.

@bertsky Do you use this or is this "just" interest in overall quality of OCR-D tools?

bertsky commented 1 year ago

The latter. I never found a use-case for myself. Messy line orderings are not rare, but they do not seem to come with correct region text.

Also, since, with https://github.com/bertsky/nmalign I wrote a general-purpose tool for (purely textual) forced alignment.