ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Explicit error if too many sequences are used #393

Closed luispedro closed 4 months ago

luispedro commented 4 months ago

Right now, strobealign only supports up to 2²⁴ sequences. If the user tries more, it would silently accept it, but later crash.

luispedro commented 4 months ago

Right now, strobealign only supports up to 2²⁴ sequences. If the user tries more, it would silently accept it, but later crash.

This was triggered when trying to map to the Greengenes database

https://ftp.microbio.me/greengenes_release/2022.10/ https://ftp.microbio.me/greengenes_release/2022.10/2022.10.seqs.fna.gz

Even a single read like the one below trigger a crash

@M05314:127:000000000-BWLLJ:1:1101:15267:1654 2:N:0:1
CCTGTTCGCTCCCCACGCTTTCGTCCCTCAGCGTCAATATTGTGCCAGAATGCTGCCTTCGCCATTGGTGTTCCTCCTGATATCTACGCATGTCACCGCTACACCAGGAATTCCACATTCCTCTCACATATTCTATTTTATCAGTTTTGAT
+
AAA1AF@1>AAAGG1A0EAFGGEHAAEGFCG1AAEE/F2FG2F2FF1CA0FBDED1BGFGFFE?AF1BFFCFHDGFFHB1FFGFGEEFE/?/BF2F@/EGEEB00/0//0BFG1>B1BGFEFHHGGFFD12BGH2FDFFFGG22GDD>@/F
ksahlin commented 4 months ago

Thanks Luis! Approved on my end. For Marcel to merge.

marcelm commented 4 months ago

Thanks, good point!