swarris / pyPaSWAS

Program for DNA/RNA/protein sequence alignment, read mapping and trimming. Extended python version of PaSWAS, supporting OpenCL and CUDA devices.
MIT License
27 stars 8 forks source link

Line by Line processing? #7

Open sebastian-nehrdich opened 4 years ago

sebastian-nehrdich commented 4 years ago

Hi there,

Great work! I just wonder: What would be necessary to change in the code to let it process the lines in file1 and file2 one on one instead of all against all? So like: line1 is compared with line1 in the other file, line2 with line2 in the other file and so on. That would be really helpful!

swarris commented 4 years ago

Hi,

Thanks for the compliment! To do this, the tool Pacasus contains already 95% of the code to do this. In Pacasus file1 and file2 are forced to be the same file for aligning only read 1 to itself. This does come with a performance penalty: in this implementation there is only one read processed at the time. Are you processing short (<5kb) reads? In that case it more efficient to change the cuda/opencl code. In the naive implementation the memory usage will be the same as doing all-vs-all, but should be relatively easy to implement this.