Closed whisere closed 2 years ago
Also tried on python 3.10.0, 3.8.9, 3.6.15, they are all the same.
I can't reproduce and tested a fresh install on Python 3.9. Could you please provide the full output of your pytest call? This would include more useful information e.g. the platform.
There is another problem with rapidfuzz which leads to tests getting stuck on qurator/dinglehopper/tests/test_integ_ocrd_cli.py
(and with the pytest process consuming 100%). This is fixed with downgrading to pip install rapidfuzz==1.9.1
.
@maxbachmann Any idea how to debug this properly? Reproducer would be using Python 3.9, installing dinglehopper with rapidfuzz 2.0.4 (including both requirements*.txt
) and running
% pytest -k test_integ_ocrd_cli.py
==================================================================== test session starts ====================================================================
platform linux -- Python 3.9.10, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mike/devel/dinglehopper-github, configfile: pytest.ini
plugins: flake8-1.0.7, cov-3.0.0, mypy-0.9.1
collected 62 items / 61 deselected / 1 selected
qurator/dinglehopper/tests/test_integ_ocrd_cli.py . [100%]
============================================================= 1 passed, 61 deselected in 1.14s ==============================================================
% pip install -U rapidfuzz
Requirement already satisfied: rapidfuzz in /home/mike/.virtualenvs/dinglehopper-github/lib64/python3.9/site-packages (1.9.1)
Collecting rapidfuzz
Using cached rapidfuzz-2.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
Installing collected packages: rapidfuzz
Attempting uninstall: rapidfuzz
Found existing installation: rapidfuzz 1.9.1
Uninstalling rapidfuzz-1.9.1:
Successfully uninstalled rapidfuzz-1.9.1
Successfully installed rapidfuzz-2.0.4
% pytest -k test_integ_ocrd_cli.py
==================================================================== test session starts ====================================================================
platform linux -- Python 3.9.10, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/mike/devel/dinglehopper-github, configfile: pytest.ini
plugins: flake8-1.0.7, cov-3.0.0, mypy-0.9.1
collected 62 items / 61 deselected / 1 selected
qurator/dinglehopper/tests/test_integ_ocrd_cli.py ^Z
[1] + 521125 suspended pytest -k test_integ_ocrd_cli.py
% kill %1
[1] + 521125 terminated pytest -k test_integ_ocrd_cli.py
(First call using 1.9.1 runs fine, second using 2.0.4 hangs)
rapidfuzz had a new release 19 hours ago that has a bugfix for relevant code, make sure you have rapidfuzz 2.0.4+!
% pip list | grep rapidfuzz rapidfuzz 2.0.4
Sorry, downgrade! pip install rapidfuzz==1.9.1
@mikegerber I can reproduce the issue and will look into it.
I tracked down a small reproducing sample:
from rapidfuzz import string_metric
a = [2425437992138244740]
b = [-4086774168534702970]
string_metric.levenshtein_editops(a, b)
Apparently I replaced uint64_t with int64_t in one to many places, which did lead to signed integer overflows inside the hashmap implementation. This is fixed by https://github.com/maxbachmann/rapidfuzz-cpp/commit/fadfb752d5f90e35e48d20ceabdde44b52c81c9e. This is fixed in v2.0.5.
dinglehopper gt ocr is not hanging after running pip install rapidfuzz==2.0.5 Thanks!
pytest reported: E ModuleNotFoundError: No module named 'qurator.dinglehopper.tests' Hint: make sure your test modules/packages have valid Python names. ===================================== short test summary info ====================================== ERROR qurator/dinglehopper/tests/extracted_text_test.py ERROR qurator/dinglehopper/tests/test_align.py ERROR qurator/dinglehopper/tests/test_character_error_rate.py ERROR qurator/dinglehopper/tests/test_edit_distance.py ERROR qurator/dinglehopper/tests/test_editops.py ERROR qurator/dinglehopper/tests/test_integ_align.py ERROR qurator/dinglehopper/tests/test_integ_character_error_rate_ocr.py ERROR qurator/dinglehopper/tests/test_integ_cli_valid_json.py ERROR qurator/dinglehopper/tests/test_integ_edit_distance_ocr.py ERROR qurator/dinglehopper/tests/test_integ_ocrd_cli.py ERROR qurator/dinglehopper/tests/test_integ_table_extraction.py ERROR qurator/dinglehopper/tests/test_integ_word_error_rate_ocr.py ERROR qurator/dinglehopper/tests/test_ocr_files.py ERROR qurator/dinglehopper/tests/test_word_error_rate.py !!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 14 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
under python 3.9.0. I guess it doesn't matter since dinglehopper is running okay? Thanks.
dinglehopper gt ocr is not hanging after running pip install rapidfuzz==2.0.5 Thanks!
Great! I'm bumping the dependency to >= 2.0.5.
pytest reported: E ModuleNotFoundError: No module named 'qurator.dinglehopper.tests'
That's a different problem. Did you follow the instructions in README-DEV.txt
?
Apparently I replaced uint64_t with int64_t in one to many places, which did lead to signed integer overflows inside the hashmap implementation. This is fixed by maxbachmann/rapidfuzz-cpp@fadfb75. This is fixed in v2.0.5.
This update also fixes my tests, great!
running dinglehopper gt txt and dinglehopper-line-dirs keep hanging without message, and pytest returns errors:
also stuck with: qurator/dinglehopper/tests/test_integ_table_extraction.py ..... [ 83%] qurator/dinglehopper/tests/test_integ_word_error_rate_ocr.py ..
python version 3.9.0. Thanks.