lh3 / gfatools

Tools for manipulating sequence graphs in the GFA and rGFA formats
208 stars 20 forks source link

conversion of GFA to rGFA #23

Open gsc74 opened 2 years ago

gsc74 commented 2 years ago

@lh3, I'm curious to know , whether we can convert GFA to rGFA? . As per rGFA format, we need to maintain 3 additional information along with each segment lines (S). So, Assuming we have GFA with segments links;

S   s1  AAT
S   s2  T

can't we just add ;

S   s1  AAT LN:i:3  SN:Z:chr1   SO:i:90374744   SR:i:0
S   s2  T   LN:i:1  SN:Z:chr1   SO:i:176753158  SR:i:0

Will it be valid rGFA ?

JosephLalli commented 2 years ago

A GFA to rGFA tool would be extremely helpful for my use case at the moment. At a minimum, I could use more information about how the SO value is calculated; it's not 100% clear to me, and trying to determine what it should be while parsing the reference human pangenome freeze 1 (which only has intermittent SO tags) has proven to be unexpectedly complex!

colindaven commented 1 year ago

I think the reverse is true, a rGFA to GFA conversion tool is (also) severely lacking.

While minigraph is apparently the most efficient and promising pangenome tool at present, the use of rGFA prohibits downstream analysis using the odgi and vg toolkits (at least to my knowledge). Therefore, no SNP calling, odgi pavs calling etc is possible.

peterdfields commented 1 month ago

Has there been any progress made on this type of tool, either in gfatools or other toolchain?

colindaven commented 1 month ago

I think at present in 2024, I'd recommend creating new pangenomic graphs using Minigraph-cactus to get GFAformat output (if possible).

There is discussion on this topic here too - https://www.biostars.org/p/9601440/#9601480

There is still a lot of room for improvement in the pangenomic tool GFAand rGFA landscape(s).