kmer_finder test failing in architecture i386

linqigang888 commented 9 months ago

While preparing to bring cutadapt 4.4 with Python 3.11 into Debian, the tests for kmer_finder worked as expected in amd64, but for i386, test_kmer_finder_initialize_total_greater_than_max resulted in the following error:

    def test_kmer_finder_initialize_total_greater_than_max():
>       kmer_finder = KmerFinder([(0, None, ["A" * 31, "B" * 31, "C" * 31, "D" * 43])])

tests/test_kmer_finder.py:85: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD of length 43 is longer than the maximum of 31.

src/cutadapt/_kmer_finder.pyx:141: ValueError
============================================================================================= short test summary info =============================================================================================
FAILED tests/test_kmer_finder.py::test_kmer_finder_initialize_total_greater_than_max - ValueError: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD of length 43 is longer than the maximum of 31.
=================================================================================== 1 failed, 621 passed, 1 deselected in 5.87s ===================================================================================

Here is the current patch prepared for Debian to accept differing test results:

--- a/tests/test_kmer_finder.py
+++ b/tests/test_kmer_finder.py
@@ -82,12 +82,15 @@

 def test_kmer_finder_initialize_total_greater_than_max():
-    kmer_finder = KmerFinder([(0, None, ["A" * 31, "B" * 31, "C" * 31, "D" * 43])])
-    assert kmer_finder.kmers_present("X" * 100 + "A" * 31)
-    assert kmer_finder.kmers_present("X" * 100 + "B" * 31)
-    assert kmer_finder.kmers_present("X" * 100 + "C" * 31)
-    assert kmer_finder.kmers_present("X" * 100 + "D" * 43)
-    assert not kmer_finder.kmers_present(string.ascii_letters)
+    try:
+        kmer_finder = KmerFinder([(0, None, ["A" * 31, "B" * 31, "C" * 31, "D" * 43])])
+        assert kmer_finder.kmers_present("X" * 100 + "A" * 31)
+        assert kmer_finder.kmers_present("X" * 100 + "B" * 31)
+        assert kmer_finder.kmers_present("X" * 100 + "C" * 31)
+        assert kmer_finder.kmers_present("X" * 100 + "D" * 43)
+        assert not kmer_finder.kmers_present(string.ascii_letters)
+    except ValueError as error:
+        assert ("D" * 43) in str(error)

 def test_kmer_finder_finds_all():

I am sure this test could be handled differently, so I have left it as an issue instead of a pr.

marcelm commented 9 months ago

Thanks for the report. It’s possible that the test failure actually indicates the code won’t work on 32 bit because I see a hardcoded DEF MAX_WORD_SIZE = 64 in the code. I’ll wait for @rhpvorderman to chime in. We could in the simplest case just disable the k-mer heuristic (with identical results, it’s just somewhat slower) if Cutadapt is running on a 32-bit machine.

rhpvorderman commented 9 months ago

Isn't it more convenient to not support i386? I mean, you have done your wonderful cutting in cutadapt... And then, what? You are going to try assembly or mapping on a platform that can only address 2GiB of memory?

Anyway, I think the most convenient fix is to change this line:

https://github.com/marcelm/cutadapt/blob/main/src/cutadapt/_kmer_finder.pyx#L49C1-L49C26

And not use size_t, but use uint64_t instead. Then it should work on 32-bit as well. (Allthough it will be much slower than using a size_t)

rhpvorderman commented 9 months ago

I made a small patch https://github.com/marcelm/cutadapt/pull/731. @linqigang888 does that fix your problem?

linqigang888 commented 9 months ago

Thank you. I have replaced the patch in Debian with #731.

rhpvorderman commented 9 months ago

Awesome! Always glad to help out Debian. My favourite operating system :heart:. Thank you for maintaining cutadapt in the Debian repos!

marcelm / cutadapt

kmer_finder test failing in architecture i386 #730