readwrite112 / AGAThA

PPoPP24 AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
12 stars 1 forks source link

Incorrect alignment results #1

Open quim0 opened 3 months ago

quim0 commented 3 months ago

I'm trying to perform two alignments with AGAThA of sequences about 200bp each. Here are the input files I'm using:

query_test.fasta

>0
GGATATAGGGGTAGCACTTGATACGGTAGCGGGCTCATTGGAGCATCCGAATGTACGTAGCTCAGTATATGAGCGAGTATATAAGAAAGTCCCATCTGATCTTACGCTCGTGCTGAAGAGTCACGTGGCTCTTCTCATACGTATGCTGCGTTTGGCATCGTATGATTACGACTAGTATGGAGACCAAAAAGGTGGGCTGCTTAG
>1
TCGAACTCATGATTAATGAGTCGGATGATAAGGGTGATAAGGACGCGTAATTGGCTGTAGTAACGTACTTGTGACTGGGGTGCCCTCATCGTTGAATACTGGAGCGGAGTCAGTACAGCGGTTTGTTAGGACAGGAAGTAATGTCAAGCGACGTGCTAACTATCTCAAATACTTACGCCCACGGAATAGGCCG

target_test.fasta

>0
GATTAGGGGTGCCACTTGATAGGGTAGTGGGCTCATTGGGGCATCCGATGTACGTAGCTCAGTTATGAGCGAGTATATAAGAAAGTCCCATCTGATCTTACGCTCGTGCTGAATAGTCACGTGGCTCTTCTATAACGTTATGCTGCGCTTTGGCATCGTATGATTACGACTATATGGGACCTAAAAAGGTGGGTGCTTAA
>1
TGCGAACTCATGATTAAATGAGCGGATGATAAGGGTGATAAGGACGCGTAATTGGCTGTAGAACGTACTTGTGAAATGGGGTGCCCTCATCGTCTGAATAACTGGAGCGCACAGTCAGTACAGCGGTTTGTTATGGACAGGAAGTAATGTTAAGCGACGTACTAACTATTTCAAATACTTAGTCCCACGGCGAATAGGCG

I want the edit distance results, so I set the match and gap-open scores to 0, but I have observed the same outputs with different scores. I execute the test program from AGAThA by doing the following:

./test_prog/manual -a 0 -b 1 -q 0 -r 1 -t -p -y "global" ../query_test.fasta ../target_test.fasta

And I get the following output:

0       CIGAR=17I16X42M33I21M67I21M34I80X32I16X17I16X34I21M16X21M17I16X21M16I16X33I16X42M16X17I21M16X21M33I21M16X50I63M50I21M33I21M16X21M33I16X21M16X16I21M16I42M16I21M50I21M33I16X16I21M17I16X17I32X17I21M33I21M33I21M49I16X42M16I21M16X17I21M16I21M16X48I21M17I48X17I32X21M16X21M16X21M17I16X50I16X17I21M16X21M16X21M17I16X16I21M33I16X21M33I16X21M17I21M32X49I21M16X33I16X34I42M16X16I21M100I16X21M50I16X21M16X17I42M16I16X33I16X21M68I16X21M16X21M16X34I
0       CIGAR=83I16X21M32X50I16X81I16X42M16I16X21M48X16I21M16I21M16X21M16I32X21M33I21M33I16X50I32X16I21M17I21M32X21M17I32X34I16X16I16X34I16X42M17I63M67I16X16I16X21M17I16X16I21M17I16X67I16X34I21M16I16X21M32X17I42M33I21M16X16I21M65I21M68I21M16I16X17I21M17I42M16I16X21M33I32X21M17I16X21M17I21M50I42M32X21M66I16X34I32X21M16X17I21M51I32X21M16X17I21M16X50I21M17I16X17I21M32X42M16X17I21M16X16I21M16I32X33I21M

The score is not correct (it is always 0 even if I put more sequences in the fasta files), and the CIGAR is also not matching the input sequences, when I evaluate the scores produced by the CIGARs, they are 2442 and 2376.

Is there something I am missing? I would like to get the correct output.

Thank you

readwrite112 commented 3 months ago

While the current version does include the code for the schemes in the paper, it is only accessible by using the option "local" as shown in this file. If other options such as "global" are used, it will run the original code from GASAL2, which is not our contribution.

I am planning a large update for both the documentation and code within (at most) two weeks (I apologize for the confusion, since this update was long overdue...). I'll let you know in this thread once the update is complete.

Thank you.

quim0 commented 3 months ago

Thank you for the fast response. I'll use local for now then. In the code update, do you plan to support global pairwise alignment?

readwrite112 commented 3 months ago

The code called by "local" is actually running global alignment as described in our paper. Again, this is a byproduct from starting my kernel code in the local kernel file in GASAL2, which will be fixed in the upcoming update.

P.S. If you are also planning to look into the kernel code as well, I strongly recommend taking a look after the update. It definitely needs a clean up.

quim0 commented 3 months ago

I'm still running into issues when using the "local" option:

./test_prog/manual -a 0 -b 1 -q 0 -r 1 -t -p -y local ../query_test.fasta ../target_test.fasta

Output:

0       query_batch_start=0     target_batch_start=0    CIGAR=1M
0       query_batch_start=0     target_batch_start=0    CIGAR=1M

Let me know if you see anything incorrect that I might be doing. I'll try again with the next code update.

readwrite112 commented 2 months ago

Sorry for the late response. I was able to reproduce your issue using the same setting. However, during the code update, I've run into some other issues, and the overall update process is taking much longer than I expected. Sorry for the delay...

Regardless of this issue, it just occurred to me that if you're also interested in obtaining the CIGAR string, unlike GASAL2, our kernel currently lacks support for CIGAR string output. My team and I are working on integrating this feature into the kernel, though I can't provide a precise timeline for its implementation at this moment.