peterjc / thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool
https://thapbi-pict.readthedocs.io/
MIT License
8 stars 2 forks source link

Segmentation fault from rapidfuzz-2.11.0 #517

Closed peterjc closed 2 years ago

peterjc commented 2 years ago

New failure on CircleCI (Linux) and AppVeyor (Windows) noted on #516, most likely a dependency change.

Worked:

Successfully installed biopython-1.79 contourpy-1.0.5 cutadapt-4.1 cycler-0.11.0 dnaio-0.9.1 fonttools-4.37.3 greenlet-1.1.3 isal-1.0.1 jarowinkler-1.2.3 kiwisolver-1.4.4 matplotlib-3.6.0 networkx-2.8.6 numpy-1.23.3 packaging-21.3 pillow-9.2.0 pydot-1.4.2 pyparsing-3.0.9 python-dateutil-2.8.2 rapidfuzz-2.10.2 six-1.16.0 sqlalchemy-1.4.41 xlsxwriter-3.0.3 xopen-1.6.0

Failed:

Successfully installed biopython-1.79 contourpy-1.0.5 cutadapt-4.1 cycler-0.11.0 dnaio-0.9.1 fonttools-4.37.4 greenlet-1.1.3 isal-1.0.1 kiwisolver-1.4.4 matplotlib-3.6.0 networkx-2.8.7 numpy-1.23.3 packaging-21.3 pillow-9.2.0 pydot-1.4.2 pyparsing-3.0.9 python-dateutil-2.8.2 rapidfuzz-2.11.0 six-1.16.0 sqlalchemy-1.4.41 xlsxwriter-3.0.3 xopen-1.6.0

Several changes including networkx-2.8.6 to 2.8.7 (unlikely give where the tests failed), and rapidfuzz-2.10.2 to 2.11.0 (which incorporated jarowinkler).

Confirmed locally tests/test_multi_marker.sh was passing on macOS, then upgraded:

$ pip install -U rapidfuzz
Requirement already satisfied: rapidfuzz in /Users/xxx/opt/miniconda3/lib/python3.9/site-packages (2.10.0)
Collecting rapidfuzz
  Downloading rapidfuzz-2.11.0-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB)
     |████████████████████████████████| 1.8 MB 2.6 MB/s 
Installing collected packages: rapidfuzz
  Attempting uninstall: rapidfuzz
    Found existing installation: rapidfuzz 2.10.0
    Uninstalling rapidfuzz-2.10.0:
      Successfully uninstalled rapidfuzz-2.10.0
Successfully installed rapidfuzz-2.11.0

And failure reproduced on macOS:

$ tests/test_multi_marker.sh
...
================
Running pipeline
================
Looking for 9 markers in 1 samples
Skipped 1 previously prepared trnL-UAA samples
Spent 0.0s running flash and making NR, 0.0s on cutadapt, and 0.0s applying abundance thresholds

Processesing 16S

WARNING: Loaded zero sequences within length range
Saved 0 unique sequences
Running onebp classifier on /tmp/thapbi_pict/multi_marker/summary/16S.all_reads.fasta
tests/test_multi_marker.sh: line 125: 74981 Segmentation fault: 11  thapbi_pict pipeline -i tests/multi_marker/raw_data/ -s $TMP/intermediate -o $TMP/summary/ -d $DB -a 10 --synthetic ''

Likely to from the performance work, quoting the change log:

add SIMD implementation for fuzz.ratio/fuzz.QRatio/Levenshtein/Indel/LCSseq/OSA to improve performance for short strings in cdist

Might be specific to the wheel on PyPI?

peterjc commented 2 years ago

Confirmed with the conda-forge packages, rapidfuzz 2.11.0 give segmentation fault, 2.10.0 works.

peterjc commented 2 years ago

Looks like an empty query set triggers this, so should be easy to workaround: https://github.com/maxbachmann/RapidFuzz/issues/277