Open sjackman opened 6 years ago
It may be possible, but unicycler_align
doesn't align a long read to a graph, just to a single linear reference. Unicycler's finding-a-path-through-the-graph stuff comes later, and is conducted using multiple reads, not one read at a time.
unicycler_align
is essentially just a long read aligner that:
So I'm not sure if the concept of GFA path really applies here. Or am I misunderstanding you?
Ryan
I thought (but perhaps I imagined this) that the first aligned segment of a read could align to somewhere in the middle of a contig to the end of that contig, then the following middle aligned segments of a read had to align the complete contig (beginning to end), then the final aligned segment of a read had to start at the beginning of a contig and could terminate anywhere in that contig. Looking now at https://github.com/rrwick/Unicycler/blob/master/docs/unicycler-align.md, it looks as though I did entirely make this up in my imagination.
Want to create this tool? 😉
On second thought, I wasn't totally off the mark. Semi-global alignment (aka glocal) alignment allows up to two ends to be clipped without penalty. Let's say a long read aligns (123) 1+, 2+, 3+ (456)
. Since an internal segment of the read aligns to 2+
, the read is clipped on both sides, so there's no free clipping left for the contig, so the full length of the contig 2+
must align from beginning to end, without any clipping, which is exactly the behaviour that I want.
Similarly, the alignment to 1+
is clipped on one side of the contig, and one side of the read, which is the two allowable clippings that you get for free.
One difference from what I described above is that it doesn't check for an edge in the assembly graph between 1+
and 2+
.
Is it possible to use a one of the Python functions of Unicycler to convert the SAM file output by
unicycler_align
to paths? Paths in GFA1 or GFA2 format would be awesome, but Unicycler formatted-paths would be great too.