ulb-sachsen-anhalt / ulb-groundtruth-eval-odem-ger

OCR Grountruth ULB VD18 German Fraktur - OCR-D Phase III
https://ulb-sachsen-anhalt.github.io/ulb-groundtruth-eval-odem-ger/
Creative Commons Attribution Share Alike 4.0 International
4 stars 3 forks source link

Remove encoded CR from region texts #3

Closed stweil closed 2 months ago

stweil commented 2 months ago

Most PAGE XML files had those unnecessary CR codes in their region texts, but 92 files were fine and did not need a fix.

Did you use different processes to produce the PAGE XML files? Which process adds the CR codes?

M3ssman commented 2 months ago

Thanks very much for the efforts!

The CR appears due corrections due https://github.com/ulb-sachsen-anhalt/transkribus-swt-gui. For training only lines are of interest, therefore I didn't care for regions up to now.