Closed Xuelei-Dai closed 3 years ago
rGFA is the standard GFA. It is really vg that imposes various vg-specific constraints.
Thank you for your quick reply!
The vg team knows how to run vg on the minigraph graphs. You may ask them.
Yes! But I have a question is that output GFA
of the minigraph
only contains S
and L
lines, I see the GFA
format is like this https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md and the vg
can accept the format, we should how to get this format?
H VN:Z:1.0
S 11 ACCTT
S 12 TCAAGG
S 13 CTTGATT
L 11 + 12 - 4M
L 12 - 13 + 5M
L 11 + 13 + 3M
P 14 11+,12-,13+ 4M,5M
Best wishes~
The only restriction on vg GFAs is that the node names are numerical ids > 0. You can use gfautil id-convert
(from https://github.com/chfi/rs-gfa-utils) to map between graphs with nodes that have string names and nodes with integer names.
The P lines are typically used to represent the mapping of a sequence into the graph. Perhaps you could derive these by mapping your original sequences back to the minigraph? You'd need to convert GAF format to P lines. I know of a few people doing that with GraphAligner output, but working on De Bruijn graphs. It should work here too. The representation will be approximate, but will show how the sequences map through the graph.
If you only want one P line per input chromosome/contig, and that's the first reference FASTA that you put into the minigraph, then you can represent it losslessly in the final graph with a set of P lines. You could probably derive this on top of the rGFA output, using the reference coordinate information, or patch minigraph to produce it directly.
If you want an exact or lossless version of the graph including all input contigs, you'll need to resolve the base-level relationships with alignment with cigars (minimap2 -c or edyeet) and seqwish. If you want a model with a local MSA for every part of the graph, you'd apply the pangenome graph builder, which extends the seqwish induction with partial order alignment, or alternatively a version of cactus that derives the MSA for each part of the minigraph (this is in development, I'm not sure where the code lives...).
Great! Thanks for your advise!
Hello Li, I have used the
minigraph
to build the pangenome graphs, but get therGFA
format is not standardGFA
format, so couldn't be used the input file ofvg
. How to get theGFA
format when we use theminigraph
to build the pangenome graph?Best wishes~