qurator-spk / dinglehopper

An OCR evaluation tool
Apache License 2.0
59 stars 13 forks source link

Improve visual alignment for longer documents #63

Open mikegerber opened 2 years ago

mikegerber commented 2 years ago

@stweil asked in #62:

Unrelated: in the result the lines from GT and OCR result are side by side at the beginning, but that synchronization gets lost later. Why?

mikegerber commented 2 years ago

The honest answer is: that the lines align nicely in shorter documents is just accidental. The text on the left is just the GT text, the text on the right is just the OCR text.

For larger documents or texts with say larger gaps we would need to make an effort to align the lines.