Closed frederic-mahe closed 8 years ago
Changing the test file to:
>a1_1
ACGT
>a2_1
ACGT
>b1_1
TGCA
>c1_1
AAAA
>c2_1
AAAA
yields 6 OTUs instead of 3 when -t
is greater than 5 (on my 2-core laptop and on a 16-core node). There is no problem when using d > 1
.
I am looking into the problem now. On my Mac I get exactly the same results as you do. Not good.
I have found the source of the bug and a solution. It affects only cases where the input sequences have not been properly dereplicated, i.e. when there are two or more identical copies of some of the sequences in the input. It also only appears when the sequences (or any microvariants) are shorter than the number of threads-1. This applies to both of the examples. I'll fix and release a new version soon.
Pfffew! So that's a very input-specific bug. What a relief!
The bug-fix release description should be something like that: "version 2.1.8 fixes a rare bug triggered when clustering extremely short undereplicated sequences."
Fixed in 2.1.8.
Richard Christen reported a segmentation fault on a big dataset. While investigating it, I found the following bug:
Results should be 2 OTUs for any
-t
value, but for-t 6
and greater, swarm returns 3 OTUs.