maickrau / GraphAligner

MIT License
260 stars 32 forks source link

Split alignments due to sequence orientation #22

Open adcosta17 opened 4 years ago

adcosta17 commented 4 years ago

Hi I've noticed that GraphAligner breaks up an alignment to a path of nodes in a graph into multiple segments when the orientation of a node changes along the path.

I'm trying to align long reads to an assembly graph where nodes are a fixed length of 1000bp. The sequence of each node may be in the forward direction or reverse complimented. If, for example, I have 9 nodes in the graph. N1 - N9 with edges between them so that there is a single path from N1 to N9. N1, N2 and N3 have sequence in the same orientation as the read I am trying to align. N4, N5 and N6 are reverse complimented relative to the read and N7 N8 and N9 are in the same orientation as the read. When I align my read that is 9000 bases long and should be identical in sequence to that of the nodes, minus the orientation, I get 3 alignment records in my output. These records start/end where the orientation changes. I see that the entire read sequence has been aligned over the 3 alignments but rather than a single record and path where the orientation of the nodes in the path varies ie >N1>N2>N3<N4<N5\N7>N8>N9 I get: >N1>N2>N3, <N6<N5N3 and N7>N8>N9. Is there a way for GraphAligner to align these as a single record that reflects the path in the graph?

I've attached the input and output files I have in a zip folder and the command I am using. " test_data.zip GraphAligner -g test.gfa -f test.fa -a test.gaf -x dbg"

maickrau commented 4 years ago

Hi, the test graph doesn't have a path for >N1>N2>N3<N4<N5\N7>N8>N9. The bandage screenshot below shows the topology of the graph. In this case GraphAligner does not merge the three alignments because the edges in the graph do not allow it.

bandage_graph

adcosta17 commented 4 years ago

Thanks, I fixed my graph structure so that it correctly reflects the path I am looking to have: >N1>N2>N3<N4<N5\N7>N8>N9. but I am still getting the same issue. I would assume that since N4, N5 and N6 are now always taken in the reverse compliment, based on the GFA, that there should be a single alignment through this path of nodes?

I've attached an updated gfa file. test_data_2.zip