yangao07 / abPOA

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band
MIT License
118 stars 18 forks source link

[simd_abpoa_align_sequence_to_subgraph1] Error in cg_backtrack. (4) #9

Open ekg opened 3 years ago

ekg commented 3 years ago

I finally found a small reproducible example of an alignment problem.

To reproduce on this input FASTA, fail_smoothxg_block_3055.fa.txt:

abpoa -s -r 3 fail_smoothxg_block_3055.fa
[simd_abpoa_align_sequence_to_subgraph1] Error in cg_backtrack. (4)
yangao07 commented 3 years ago

Thank you @ekg for providing this example. This is a bug related to the banding. So disable banded DP (set b as -1) is the easiest way to get rid it. Also I am trying to figure out how to fix it.

The banded DP is more fragile when the lengths of sequences differ too much, like this data: 264 vs 443. This happened previously, I thought I fixed it.

ekg commented 3 years ago

Any workaround (even dropping into non-banded mode when this happens) would be helpful! What do you suggest?

Running everything non-banded to avoid this issue would be expensive.

rvolden commented 3 years ago

I'm also running into this issue when using the python API. Instead of being able to handle the error, the thread that gets the error just hangs since this error kicks you out of python. Is there a way for me to be able to handle this error through the python API? Disabling adaptive banding takes too long.

ekg commented 3 years ago

There is now a flag on the result object that indicates if the traceback was OK. It's not propagated to python.

On Tue, Oct 13, 2020, 21:15 Roger Volden notifications@github.com wrote:

I'm also running into this issue when using the python API. Instead of being able to handle the error, the thread that gets the error just hangs since this error kicks you out of python. Is there a way for me to be able to handle this error through the python API? Disabling adaptive banding takes too long.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yangao07/abPOA/issues/9#issuecomment-707953248, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEIFJ6QIILWQUJCRURDSKSRO5ANCNFSM4SGARACA .

yangao07 commented 3 years ago

@rvolden As mentioned by Erik, this new flag is not added in pyabpoa right now, I will get it done sone. However, I didn't implement the ambiguous strand mode in python for now. So I guess what you met is different from what Erik posted here. Can you share with me the sequences that cause the error? That would very helpful. Thanks.

yangao07 commented 3 years ago

Anyway, the ultimate goal is to fix this bug instead of just break the loop and not provide any alignment result. I am working on that.

rvolden commented 3 years ago

You're right, the traceback error is 2, not 4. It's for a pairwise alignment where one has a long polyA but the other doesn't. I'm including the initialization as well as the sequences here

poa_aligner = poa.msa_aligner(match=5, extra_b=16) # anything lower for extra_b throws the traceback error
res = poa_aligner.msa(subreads, out_cons=False, out_msa=True)
# errors out here
>0
CTGACATTTCGGTGGAGAATTTTTTTATATTTGTATTCTCAGCGTAAAGTCTCCCCTGGATATATTTGTGTTTATGCTGATATTGGCATCCATGTTTGACGGAGGATTATCAGGTAGGTAAATTACTTCATTTGGAGATGAGGTGGTTGTACATTAACTTCCCTCCTCC
TATATTGACTAGCCTTCAACTGGTTCTAAGCAGTGGTATCAACGCAGAGTACATGGGGATTCCTGAAGCTGACAGCATTCGGGCCGAATGTCTCGCTCCGTGGCCTAGCTGTGCTCGCGCTTCTCTCTCTTTCTGGCCTGGAGGCTATCAGCGTACTCCAAAGATTCAGGT
TTACTCACGTCATCACAGAGAATGGAAAGTCAAATTTCCTGAATTGCTATGTGTCAGGTTTTCATCCATCCGACATTGAAGTTGACTTACTGAAGAATGGAGAAGAATTGAAAAGTGGAGCATTCAGACTTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTAC
ACTGAATTCACCCCCACTGAAAAGATAGGTATACTGCCATGTAGAACCATGTGACTTTGTCACAGCCCAAGATAGTTAAGTGGGATCCGAGACATGTAAGCAGCATCATGGAGGTTTGAAGATGCCGCATTTGGATTGGATGAATTCAAATTCTGCTTGCTTGCTTTTTAA
TATTGATATGCTTATACACTTACACTTTATGCACAAAATGTAGGGTTATAATAATGTTAACATGGACATGATCTTCTTTATAATTCTACTTTGAGTGCTGTCTCCATGTTTGATGTATCTGAGCAGGTTGCTCCACAGGTAGCTCTAGGAGGGCTGGCAACAGAGGTGGGA
GCAGAGATTCTCTTATCCAACATCAACATCTTGGTCAGATTTGAACTCTTCAATCTCTTGCACTCAAAGCTTGTTAAGATAGTTAAGCGTGCATAAGTTAACTTCCAATTTACATACTCTGCTTAGAATTTGGGGGAAAATTTAAATATAGTTGAACCCAGGATTATTGGA
AATTTGTTATAATGAATGAAACATTTTGTCATATAAGATTCATATTTACTTCTTATACATTTGATAAAGTAAGGCATGGTTGTGGTTAATCTGGTTTTATTTTTGTTCCACAAGTTAAATAAATCATAAAACTTGAAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAA
AAAAAAAAAAAAAGTATTCCATAAGACTCTGCGTTGATACCACTGCTT
>1
CTGACATTTCGGTGGAGAATCTTATTATATCGTGCTTCTCAACTGTAAAGTCTCCCTGGATATATTTGTGTTTATGCTGATATTGGCATCCATGTTTGACAGAGGATTATCAGGTAGGT
AAATTACTTCATTTGGAGATGAGGTGGTTGTACATTAACTTCCCTAAATATATATCTTCAAGCCTTCAACTGAAAGTTCTAAGCAGTGGTATCAACGCGAGTCTTTTTATGGAATACTTATTGAACAGGTAATTCACTGTAATATTTATTAAGTGATGACTAGAGGGATAT
TGATAGATGTAAAAATTTTCACTCACAGTGAACATGAAACCTTTACACATGTAAGGTTTAGATTCTTTTTTTTTAATCTGCCCCTTTCAGATTATATCATGGTATATGAAGCACTGGTGAGGTCTATGTCACCAGAAATTCCCCAGTTTGCTGATTTGTTAGGTTTTTTAA
CCCGATGATTGTACTGCAACAAGTGAGCATCATTCACTGCAACCTTGAAGTGGTCAGGTTCAACCAGTACTTGTATTTTGAATGGTTTCCCACTTTCAAATGGGAAAACCGACTGTCTTTCTTCCCTTCCCCAGTTATTATCCAGCTTTGTATTGCCAAACAATGACTCTC
CTGTTGTTCTCATTGAAGCGTGGGTTAAAGTGGAAGGCAACATCATTCCCTCTTTGGAAATCTAAAGCAATTCTGTTTGCATTGGGCTTCACCGTGCCCAGAATTGTTATCAGCATGCGAGGCACCACTCCCCGGTAAAGAGAGCAGGTTATAAGGCACAATCAGTGGCCC
AGCAGGGGCGCCATAGGGGCCAGTGGCGGGAGTAGGCTCCGGTGGCACTTGGCTGTCCAGAAGATGGGTAGGCCCCAGGGCCGCTGGGTGGCCCTGGTGGGCTCCAGGTGCAGGTGCCGGGATAAGCTCCAGGTGCTCCAGGGTAGGCGCCTGGAGGTGCCTGGTCAGGAT
AGCCCCCTGGGGTGCCTGCCCGGGGTAGGCCCAGGATGGGGCCCTGGGTGGCCCCTGCCCCAGCAGGCTGGTTCCCCCATGCGCCAGGCTCGCCAGGGTTTGGGTTTCCAGACCCAGATAACGCATCATGGAGCGCTCGTTGGCTGGCTCCGGACGGCTGCTGGCGAGGAG
GTGCTGCGGGCCCCCCATGTACTCTGCGTTGATACCACTGCTTCT
yangao07 commented 3 years ago

I modified the codes of the traceback part in the latest commit. Hopefully, this can resolve these bugs. I also removed the trackback_ok flag, since it is not needed if we can finish the traceback step.

@ekg @rvolden This works on the two sequence sets you guys provided here, please try it out on some other data.

Yan

ekg commented 3 years ago

Unfortunately, I still find cases that cause this error.

fail_smoothxg_block_9338.fa.txt

-> % abpoa -s -r 3 fail_smoothxg_block_9338.fa.txt
[simd_abpoa_align_sequence_to_subgraph1] Error in cg_backtrack. (4)
rvolden commented 3 years ago

I don't get the error for python2, but I get it in python3. The only modification I made to the makefile for python3 was to change the command to python3 instead of python:

  103 install_py: python/cabpoa.pxd python/pyabpoa.pyx python/README.md
~ 104 |   ${py_SIMD_FLAG} python3 setup.py install
  105 |   
  106 sdist: install_py
~ 107 |   ${py_SIMD_FLAG} python3 setup.py sdist #bdist_wheel

To clarify, this is python 3.6.9, and the error I get with the same sequences I provided is [simd_abpoa_align_sequence_to_subgraph1] Error in cg_backtrack. (2)

yangao07 commented 3 years ago

Unfortunately, I still find cases that cause this error.

@ekg This is a different case and different type of bug. Working on that. Before I fix it, you probably want to roll back to the version where you added the traceback flag.

yangao07 commented 3 years ago

I don't get the error for python2, but I get it in python3.

Nothing was changed related to the python side. Did you re-install pyabpoa in python3?

rvolden commented 3 years ago

Yeah, I reinstalled using pip3 for python3

yangao07 commented 3 years ago

Yeah, I reinstalled using pip3 for python3

These changes haven't been pushed to the pypi. So pip3 install will give you the old one. To install locally from source, try make install_py or python3 setup.py install.

rvolden commented 3 years ago

I should've been a bit more clear. I did make install_py after modifying the make file. When it didn't work, I tried reinstalling using pip, and I also tried python3 setup.py install, which also throws the traceback error

yangao07 commented 3 years ago

This really sounds weird to me. Also, it works on my pc when I install with python3. Maybe you can try to remove everything and reinstall it.

rvolden commented 3 years ago

Removed everything previously installed. It's working now. Thank you!