Closed tallemeersch closed 3 months ago
Thanks @tallemeersch! Could you upload the two files, I suspect there is an issue with the line endings (and I'm always interested in real user data)?
Nevermind, I believe this always happens. I'll look into it.
Hereby the files attached. The command to produce the report was: dinglehopper --textequiv-level line 02_GT.txt 02.xml 02.zip
Fix is in git master and will be in the next release!
Thanks!
In ocr_files.py, line 170, readlines is performed. This method keeps the newlines, leading to incorrect CER score. Below is the current report given ground truth as txt and OCR as XML vs. the report when strip() is added to lines 170, i.e. make_segment(no, line.strip())