mgawan / GPU-BSW

Other
8 stars 6 forks source link

Silently produces incorrect answers by truncating sequences #12

Closed r-barnes closed 3 years ago

r-barnes commented 4 years ago

The upper limit on sequence-length is limited to the hardware to the maximum number of threads in a block (1024 or 2048 for modern hardware). If there are sequences in the input longer than this they are silently truncated and incorrect results are returned.

armintoepfer commented 3 years ago

Would be great to have no upper limit on the input read length.

mgawan commented 3 years ago

the algorithm accepts two sets of sequences, it picks longest sequence length among the two sets and then picks the shorter length among those two. This is the number of threads per blocks that will be launched, if this exceed 1024 the kernels will crash with CUDA generated error. The algorithm was designed and optimized for short reads typically less than 300 bases in lengths but there is no upper limit on the length of reference sequence. @r-barnes I could not generate this truncation bug, if you have an example data set which I might use to reproduce this please share. I believe the limit on threads per block even in the CC 8.0 and 8.6 devices is 1024.

armintoepfer commented 3 years ago

Is there any chance to have a different kernel for >1024 input length?

mgawan commented 3 years ago

right now that is not a priority for our project, for long read technologies we plan on using X-drop algorithm instead of SW. We do have a GPU accelerated version of X-drop (developed by a former colleague of mine) that we plan on integrating in our work flow. Its available here: https://github.com/albertozeni/LOGAN