Open rlorigro opened 1 year ago
Just reposting the details here for better visibility. This seems like a pretty serious bug to me.
Here is the GFA:
S 1 tcatggtacttgtcaccttctgtcctatttgagtcatgatgcatttgtctatcatctatctccctgcaatgaacatccctggggtggggccagggtctgttttgcttggtgctgcattcccattgccaagaagacctcacaggcatatcaagcacacagcaggtgcataacacttgctgagtttgctgaatgaatCCCTCCTCTGTATCCTACACATCCAACTGGTCACCACGTCCTGTGCGGTAGATGTCCTTTTTAGCAGGTCCCCTGCACTTCACTGTGTTTGTTAAGCCTTGGCTTGCTCTGtcatttcattttgcttaggTGGCGGCAGCAGCCCCTGACAGCCCCTGGCCTCACAGCCCCTGGCCTCAACCCATATCCTCTCCACTCTGCACACAGCTGCTGGGAGACTTTTCCTTAAAACTtggtttctctcctttcccttctgaAAACTGTCAGTGGCTTCCTGTGGCTCAGAGGAATCAATCGAAACATGGTGCTGCTCACAAGGAAACACAACGTTTCTAGGGCCAGTGCTGTCTTTGTCCTGCCTTCCTCCAAATTCAGCCACGTTAAGCCTCAAGCAGATCCTCCCACACAGCTGGGCTTCTTCTGCCTTTCTGCCAGGACTTGCTGCTCCCAGGAGCCTCCTCCCCCAGTCCCTGCAGCCCCAGGTCAGGTGTCTCCCCACCCAGCCTGGCTCAGCTGTCCTTTCCCGTGCAACTACAGGTTCATACAATTTGCTATTGCTCCATCTACTCTATGTGACTCTTGGTTTCTTGAAGGCAGGTATGggactctttctttccttctttgctttcttttctctttctctttctttctttttctttctttctttctttctttc
S 2 tttctttctttctttctttctttctttctttctttctt
S 3 tctctctctctctctctcttttcttttctttctttctctctttctgtttttagagactgggtcttgctgtcacccaggttggagtgcagtggtgcaatcatggctcaccgaactcctgggctcaagtgagcctcttgcctcagcctcccaattagttgggactacaggcatgtgccactacacctggctaattgttattattattattattattattattatttttgtaaagacagggtcttgctctgtttcctaggctggtcttgaacccctggcctcaaatgatcctcctgcctcagcctcccaaagtgctgggattccaggagtaagataccatgtctggccTGAGAAATTTTCTAAAAGGCATGTTTTTGCACTGtccaatatgtttttctttttatcccagcctctagcacaagtgcctgactcgaacgaagcagggactctgaaatatcttgaatgaacaagtAAATGGCACTTGAATAGGTGGCCTCTAATGtgcagaggaaggaaaagggagctgacgctttccgagtgttcagtaccttctaggcaccatgctgggccttttatgtagttctcgatgaatcctcatggcatcccGCTTTGCAGAATGATTCTAGTTACCCAGAGAGCAAGTAGCCCTTAGGGTCAGGACATGCATCTGAGTTGGTCTAGTCCCAAAGCTCCTGGTCCTCTCCTGACATCActtaaacagagaaacagaactccCCTTGGGCTTCTAGGGGCGCTGGGTTCAGGAGGCACAGCCACTCCCTTTGTTCTTCCTGGCAGCTGCCCCACCAGCAGTGAGCCCATCCCacctctgggttttcttttt
L 1 + 2 + 0M
L 1 + 3 + 0M
L 2 + 3 + 0M
When I concatenate 1,2,3 or 1,3 and align them to the GFA, I never get an end-to-end alignment, and node 2 is always missing:
1 871 1 871 + >1 871 1 871 870 870 60 tp:A:P NM:i:0 cm:i:51 s1:i:714 s2:i:0 dv:f:0.0756 cg:Z:870=
3 841 5 836 + >3 841 5 836 831 831 60 tp:A:P NM:i:0 cm:i:48 s1:i:672 s2:i:0 dv:f:0.0790 cg:Z:831=
1_2_3 1750 1 857 + >1 871 1 857 856 856 60 tp:A:P NM:i:0 cm:i:50 s1:i:714 s2:i:0 dv:f:0.0751 cg:Z:856=
1_2_3 1750 930 1745 + >3 841 21 836 815 815 60 tp:A:P NM:i:0 cm:i:47 s1:i:671 s2:i:0 dv:f:0.0795 cg:Z:815=
1_3 1712 1 857 + >1 871 1 857 856 856 60 tp:A:P NM:i:0 cm:i:50 s1:i:700 s2:i:0 dv:f:0.0751 cg:Z:856=
1_3 1712 892 1707 + >3 841 21 836 815 815 60 tp:A:P NM:i:0 cm:i:47 s1:i:672 s2:i:0 dv:f:0.0795 cg:Z:815=
Here is a visualization showing how neither the graph nor the linear query are fully covered:
I have tried multiple different combinations of parameters to attempt to find seeds/minimizers in node 2:
Defaults:
PanAligner -c -x lr -o /tmp/tmpn2oq6iyx/reads_vs_graph.gaf bad_tandem_graph.gfa bad_tandem_nodes.fasta
/tmp/tmpn2oq6iyx/reads_vs_graph.gaf
Permissive:
PanAligner -c -k 14 -n 1,1 -m 1,1 -x lr -o /tmp/tmpyid87tic/reads_vs_graph.gaf bad_tandem_graph.gfa bad_tandem_nodes.fasta
/tmp/tmpyid87tic/reads_vs_graph.gaf
I've also tried increasing the -g
and -r
parameters to allow the extension step to find more alignments, but that does not seem to have any effect. Alternatively, extreme 0/1 values of -f
also have no effect.
Could you help me with this case? I am having difficulty understanding why the base level alignment step doesn't bridge the chains here. Ideally, I would use base level alignment for a majority of the work, and only rely on minimizers for anchoring the flanks of the tandem repeat.
Thanks
Hi,
I am cross-posting this issue which I have submitted to the PanAligner repository, since it has been found by the authors to be a result of the base-level alignment step taken from minigraph/gwfa.
In summary, if I take a simple 3 node graph in which the middle node could be viewed as an insertion, and I concatenate the three nodes as a query, minigraph fails to find an end-to-end alignment. The shortest node is 38bp.
Thanks
PS: to clarify I've run the exact same test with minigraph and found the same results