qurator-spk / sbb_ocr_postcorrection

Two-Step Approach to OCR Post-Correction
Apache License 2.0
14 stars 4 forks source link

Missing words #5

Open mikegerber opened 2 years ago

mikegerber commented 2 years ago

In my test text, some words are missing in the result. Suspiciously they always(?) are on the end of splitted lines:

[sbb_ocr_postcorrection]mike@leguin sbb_ocr_postcorrection % grep articuli actevedef_718448162_00000024.txt.json                                      main ?
                    "wie \u017fon\u017ften hier gew\u00f6hnlich, articuli"
[sbb_ocr_postcorrection]mike@leguin sbb_ocr_postcorrection % grep Wirth actevedef_718448162_00000024.txt.json                                         main ?
                    "Der Schnlthei\u00df zu Oberrod, der Wirth"
[sbb_ocr_postcorrection]mike@leguin sbb_ocr_postcorrection % grep accus actevedef_718448162_00000024.txt.json                                         main ?
                    "\u017fec. cap. accedens 23. X. de accus."