maickrau / GraphAligner

MIT License
256 stars 30 forks source link

ambiguous alignments #23

Closed jts closed 2 years ago

jts commented 3 years ago

Hi,

I'm testing GraphAligner for a project where I need all equivalent alignments that have the maximum score. I've tried the --all-alignments option but it does not return all the alignments that I would expect. Here's a minimal toy example to demonstrate.

example.gfa:

H       VN:Z:a.0
S       A       GTCAGTAGCATGCATGCTATATGCTAGCATGCTAGATATATCTCGATACGACTAGCATCGACGTCAGCGATCAGCGTAGCGAGC
S       B       TTAGTACATATAGTACCCATAATATGTATAAACCCCGATTATTGAGATATACACCACTAGATATATACCCCAAATGATATTATT
S       C       GTCTAGCTAGCTAGCTAGCTAGATCGATCGATCACGATCAGCTCAGCTAGCTACGACTATGACTACGACTAGCACTACGACTAG
S       D       GTCTAGCTAGCTAGCTAGCGTCAGCTAGTAGCATCGACTAGCTACTACGATCAGCTAGCTAGCATCGACTACGATCAGCTACGA
L       A       +       B       +       0M
L       B       +       C       +       0M
L       B       +       D       +       0M

Two strings are represented by this simple graph - ABC and ABD. Notably C and D share a common prefix.

I have generated three reads:

reads.fasta:

>read_ABC
GTCAGTAGCATGCATGCTATATGCTAGCATGCTAGATATATCTCGATACGACTAGCATCGACGTCAGCGATCAGCGTAGCGAGCTTAGTACATATAGTACCCATAATATGTATAAACCCCGATTATTGAGATATACACCACTAGATATATACCCCAAATGATATTATTGTCTAGCTAGCTAGCTAGCTAGATCGATCGATCACGATCAGCTCAGCTAGCTACGACTATGACTACGACTAGCACTACGACTAG
>read_ABD
GTCAGTAGCATGCATGCTATATGCTAGCATGCTAGATATATCTCGATACGACTAGCATCGACGTCAGCGATCAGCGTAGCGAGCTTAGTACATATAGTACCCATAATATGTATAAACCCCGATTATTGAGATATACACCACTAGATATATACCCCAAATGATATTATTGTCTAGCTAGCTAGCTAGCGTCAGCTAGTAGCATCGACTAGCTACTACGATCAGCTAGCTAGCATCGACTACGATCAGCTACGA
>read_partial
GTCAGTAGCATGCATGCTATATGCTAGCATGCTAGATATATCTCGATACGACTAGCATCGACGTCAGCGATCAGCGTAGCGAGCTTAGTACATATAGTACCCATAATATGTATAAACCCCGATTATTGAGATATACACCACTAGATATATACCCCAAATGATATTATTGTCT

Note that read_partial is a substing of both ABC and ABD.

When I run

GraphAligner -g example.gfa -f reads.fasta -a example.json --all-alignments -x vg

I get the following output (reformatted for clarity):

read_ABC        A,B,C   1.00
read_ABC        C,B     0.52
read_ABD        A,B,D   1.00
read_ABD        A,B,C   0.91
read_ABD        C,B     0.52
read_partial    A,B,C   1.00

read_partial only has an alignment to ABC and the equivalent alignment to ABD is missed. I suspect I am missing a parameter or the issue is the common prefix that B and C share (I thought this would be OK) but any help would be appreciated.

Jared

maickrau commented 2 years ago

Most recent version should now return all equivalent alignments with appropriate mapping qualities. Please open a new issue if something doesn't work with equivalent alignments.

jts commented 2 years ago

Thanks!