Open jelber2 opened 6 years ago
Hello, what would be the use case of the outputed gfa format (i.e. what information are you seeking from racon)?
Best regards, Robert
Well, I would like to input a gfa file from racon to the Hi-C scaffolding program SALSA (https://github.com/machinegun/SALSA). Granted I don't know if a gfa representation of the assembly would improve the output from SALSA or not.
My workflow is to take PacBio reads overlapped from minimap2 as input for miniasm to de novo assemble them and then call consensus with racon then input a gfa file from racon into SALSA then polish with pilon: minimap2->miniasm->racon->SALSA->pilon
But, I could alternatively do the following minimap2->miniasm->SALSA->racon->pilon
I have tagged this as enhancement and will deal with it soon.
Best regards, Robert
I'd find this feature useful as well. I'm polishing a Miniasm assembly using Racon. It'd be useful to preserve the graph after polishing with Racon. Consider supporting both GFA 1 and GFA 2.
How should I preserve the GFA file? Sequences change and alignments might be invalidated.
The GFA 1 output by Miniasm includes estimates of the amount of overlap, but doesn't include an actual alignment. So I think you could get away with not modifying the edges at all. The edges output by Miniasm look like this:
L utg000001l + utg001226l + 19386M SD:i:5467
After the sequences are corrected by Racon, you could realign the two sequences incident to each edge, and it's possible that some of the ambiguities in the graph could be resolved post-Racon.
It is a bit tedious to add the format into Racon as we only need the S rows. Wouldn't a simple post-processing script be an easier solution? A script that updates the GFA file with polished sequences and maybe realigns edges?
A post-processing script may be easiest. That script would take in the GFA file produced by Miniasm, the FASTA file produced by Racon, and produce an updated GFA file. Is that script something that you're interested in creating? Or perhaps a task for Gfakluge or GfaPy.
Well I might add such a script but I am not sure when I will get the time for it :/
No worries. I'll let you know if I get around to it myself.
Great, thanks!
Any updates?
Not from me
Neither from me :/
I don't suppose anyone had a chance to look at this?
Unfortunately not :/ I'll try and deal with it later this year.
I used this AWK script to take the sequence from polished.fasta
the graph from draft.gfa
and produce a polished.gfa
file.
seqtk seq polished.fasta | gawk -vOFS='\t' 'ARGIND == 1 { id = substr($1, 2); getline; x[id] = $1; next } $1 == "S" && x[$2] { $3 = x[$2] } 1' - draft.gfa >polished.gfa
See also https://github.com/edawson/gfakluge and https://github.com/ggonnella/gfapy/ for manipulating GFA files. I'd still love to see this feature in Racon.
so you basically taking the unpolished assembly graph and the new polished sequences and creating a polished graph? Am I correct?
That is what I understand @sjackman's code is doing.
Yes. I'm working with an assembly graph whose edges are blunt (no overlap, 0M
) from Flye or Unicycler. This simple script does not recompute the edge alignment for other assemblers.
mmmh I see, than I cannot use it ...
You could replace all the CIGAR strings with *
(meaning unknown).
Hello Robert Any progress or update to create .gfa output by Racon?
@ardy20, unfortunately no. Minipolish seems as a decent solution for this issue :)
Hi, I was wondering if it were possible for racon to output a gfa file in addition to fasta?