ksahlin / strobemers

A repository for generating strobemers and evalaution
75 stars 12 forks source link

fixes a bug which introduces erroneous entries in mers_vector when computing minstrobes #12

Closed blinard-BIOINFO closed 5 months ago

blinard-BIOINFO commented 6 months ago

@ksahlin

Incorrect if Condition in Index Generation for minstrobes

Description:
An incorrect if condition leads to errors during the index generation for minstrobes, resulting in incorrect values. @marouaneboumlik98 (our student) found that changing the comparison is enough to fix the bug.

Impact:
This causesd incorrect index generation, affecting the accuracy of mapping results.

To reproduce: use version before the fix, attached files, and the following commands:

Command Used: ./StrobeMap -k 10 -n 2 -v 11 -w 15 -c minstrobes -o mapped.tsv ref.fa read.fa

As shown in the result file allMersVector.csv on lines 1, 2, and 14.

Hash value RefID Pos strobe 1 Pos strobe 2 Pos strobe 3
9863 0 54 497928 497928
49626 0 55 497928 497928
103236 0 44 55 55
105440 0 46 59 59
106712 0 42 54 54
115443 0 5 19 19
163407 0 31 44 44
169210 0 19 31 31
169631 0 3 17 17
175292 0 13 25 25
179254 0 39 53 53
179837 0 17 31 31
184950 0 25 39 39
202716 0 56 497928 497928
204796 0 53 65 65
205123 0 30 44 44
215534 0 2 13 13
228731 0 1 13 13
232695 0 28 39 39
237749 0 29 42 42
261466 0 16 30 30
272296 0 51 63 63
278894 0 6 19 19
284876 0 11 25 25
295497 0 45 59 59
312866 0 34 46 46
361799 0 52 63 63
363599 0 0 13 13
366845 0 12 25 25
369158 0 33 44 44
370175 0 37 51 51
371847 0 40 54 54
387185 0 22 34 34
399836 0 47 59 59
411673 0 7 19 19
415217 0 8 19 19
415943 0 43 54 54
424988 0 41 54 54
451385 0 20 31 31
453038 0 14 25 25
474401 0 4 17 17
494000 0 18 31 31
494722 0 27 39 39
511251 0 50 63 63
520043 0 9 22 22
520245 0 32 44 44
522991 0 23 37 37
528256 0 48 59 59
536163 0 35 46 46
541996 0 49 60 60
546037 0 15 29 29
549875 0 38 51 51
560253 0 26 39 39
578971 0 36 47 47
579274 0 24 37 37
589182 0 10 22 22
612599 0 21 34 34
ksahlin commented 5 months ago

Thank you for the fix! Classic off-by-one error..

i + w_max was allowed to go up to string_hashes.size(); which access an address outside the vector with string_hashes[i + w_max];.