Closed mikegerber closed 2 years ago
Same problem with the non-OCR-D-CLI:
sbb_textline_detector -i OCR-D-IMG_00000024.tif -o test-out -m /home/mike/devel/qurator-data/textline_detection
Text regions look ok at https://github.com/qurator-spk/sbb_textline_detection/blob/master/qurator/sbb_textline_detector/main.py#L2077 but they get reset in https://github.com/qurator-spk/sbb_textline_detection/blob/master/qurator/sbb_textline_detector/main.py#L2089-L2091 - so I'm guessing the contour detection throws an exception.
The error module 'cv2' has no attribute 'cv2'
is caught here:
I think the exception catching here is too broad and bad practice. If there's a specific exception to catch, it should be specified and that would have made it easier to track down this kind of bug - by giving a proper error message instead of silently ignoring it.
This is fixed by downgrading opencv-python-headless
- the version 4.6.x from June 2022 seems to break contour detection here, therefore sbb_textline_detector
is not giving any text regions and thus not giving any text lines either.
I'm preparing a PR to workaround the issue by requiring opencv-python-headless < 4.6
.
👀 @kba This - the broad exception catching and the attribute error with the newest OpenCV version - might come up in eynollah too.
PEP8 (https://peps.python.org/pep-0008/) also has an opinion about this:
When catching exceptions, mention specific exceptions whenever possible instead of using a bare except: clause:
try: import platform_specific_module except ImportError: platform_specific_module = None
A bare except: clause will catch SystemExit and KeyboardInterrupt exceptions, making it harder to interrupt a program with Control-C, and can disguise other problems. If you want to catch all exceptions that signal program errors, use except Exception: (bare except is equivalent to except BaseException:).
A good rule of thumb is to limit use of bare ‘except’ clauses to two cases:
If the exception handler will be printing out or logging the traceback; at least the user will be aware that an error has occurred. If the code needs to do some cleanup work, but then lets the exception propagate upwards with raise. try...finally can be a better way to handle this case.
A bare except: clause will catch SystemExit and KeyboardInterrupt exceptions, making it harder to interrupt a program with Control-C, and can disguise other problems.
Ah that's why I always had problems interrupting the run of this program!
There is still something broken, with https://qurator-data.de/examples/actevedef_718448162.first-page+binarization+segmentation.zip and
ocrd-sbb-textline-detector --overwrite -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB-TLD -P model "/home/mike/devel/qurator-data/textline_detection/"
I get text regions, but there aren't any useful text lines (green) detected:
Thanks @vahidrezanezhad, I'll test it!
With opencv-python-headless == 4.5.1.48
(c4df3d6), it looks fine:
Using https://qurator-data.de/examples/actevedef_718448162.first-page.zip,
ocrd-sbb-textline-detector --overwrite -I OCR-D-IMG -O OCR-D-SEG-LINE-SBB-TLD -P model "/var/lib/textline_detection"
only gives:I'm investigating.