ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Entries of the unaligned #424

Closed u-xixi closed 1 month ago

u-xixi commented 1 month ago

Hi Strobealign devs,

I just used Strobealign 0.13.0 and there seems a few problems with the unaligned. This was my command: strobealign -t 10 -U -S 0.9 $assembly $r1 $r2 -o output.sam Here are the problems:

LH00188:75:22GJ27LT3:1:1103:48673:18881 129     MLD1_08_2011_000009426167       0       60      1M150S  MLD1_08_2011_000002291692       1       0       CAACAACACCAACCACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACACCAACAACAACAACACCAACAACAACAACCACACCAACAACCACCAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII-IIIIIIIII9III9I9IIIIIIIIII99II-IIIII-9I9IIII9II9IIIIII-IIIII-IIIIIII NM:i:1  AS:i:10
LH00188:75:22GJ27LT3:1:1103:48201:25210 69      MLD1_08_2011_000009696145       0       0       *       =       0       0       GAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII99IIIIIIIIIIII9II9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
LH00188:75:22GJ27LT3:1:1103:48201:25210 137     MLD1_08_2011_000009696145       0       60      1M150S  =       0       0       CTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTTTCTCTCTCTCTCTCCCTCTCTCTCTCTCCCTCCCTCTCTCTCCCTCTCTCTCTCTCTCTCCCTCCCTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9III9II9II-9II9II9I99I9-II99II-II99-I---9--9999-9II9-I-I---I-9-------9-999-999----9----9-9-- NM:i:1  AS:i:10
LH00188:75:22GJ27LT3:1:1104:37911:24809 177     MLD1_08_2011_000009402528       0       9       1M150S  MLD1_08_2011_000000197329       1207    0       CCTCCCCCCCCCCCTCTCCTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 9-99-----9----9------9I--999IIIIII-9--9II9999999I99-I-I-9I9IIIIII9IIIIII9IIIII9III9I9IIIIIIII9IIIIII99II99III-9I9II9I9IIII9I9IIIIIIIIIIIIII99III99-9III NM:i:1  AS:i:10
LH00188:75:22GJ27LT3:1:1105:17144:2762  81      MLD1_08_2011_000009520166       0       1       1M150S  MLD1_08_2011_000009516899       1       0       TCTCTCTCTCTCTCTCTCTCTCTCTCTCCCCTTCTCTCCCTCTCTCCCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCCCTCCCTCCCTCCCCCCCTCCCCCCTCTCTCCCTCTCTCCCTCTCTCCCTCTCTCCCTCTCTCCCTC IIII99IIII-IIIIIIIIIII9IIIII9I9I99IIIIIIIII9IIIIIIIIIIIII-IIIIIIIIIIII-IIIIIIIIIIII9III9IIIIIIIIIIII9II9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIII9 NM:i:1  AS:i:10

Is it a problem with the program or my usage?

Best, Xixi

marcelm commented 1 month ago

Is it possible for you to share the reference FASTA and some of the input reads? You can also send this to my e-mail address (shown in my profile).

marcelm commented 1 month ago

Thanks for sending the data. You’re reporting two issues:

the -U flag doesn't seem to work. There are plenty of entries if i run samtools view -f 4 output.sam

This is not a bug, but we need to document this better: The -U flag is implemented in such a way that it outputs both alignments for paired-end reads even if only one of them is mapped. So with samtools view -f 4, you’ll find the alignments where just one of the two reads could be mapped. To check whether the option works, you would need to use samtools view -f 12 (flag 8 is "mate unmapped").

some alignment (mostly pretty bad ones) got zero coordinates, [...]

That’s a bug. I found out that this happens when the alignment function we’re using doesn’t return a valid alignment, for example, when the query is sth. like TCTCTCTCTCTCTCTCT and the reference GAGAGAGAGAGAGAGAGAGAGA (as in your example). Normally, the way the seeding in strobealign works, this won’t happen, but it seems that it can happen for your reference.

I’ll try to fix this soon.