vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

Merging multiple graph files #4289

Open Lucio-Yang opened 1 month ago

Lucio-Yang commented 1 month ago

Hi!

I have constructed the multiple graphs and i want to combine that to a single graph file. But I found that both vg combine and vg ids can combine multiple vg files, and the output size is different. Which one should I use and what is the difference ?

Thank you very much!

glennhickey commented 1 month ago

Only vg combine can combine multiple graphs into a single graph file -- so use it.

Lucio-Yang commented 1 month ago

Thanks! I used vg combine to merge the vg files of multiple chromosomes, and then I wanted to get the corresponding vcf file, but the following error occurred. Why does the path in the merged file disappear?

Error [vg deconstruct]: No specified reference path or prefix found in graph

My code: vg combine chr1.vg chr2.vg chr3.vg chr4.vg chr5.vg chr6.vg chr7.vg chr8.vg chr9.vg chr10.vg chr11.vg chr12.vg chr13.vg chr14.vg > merged.vg vg view --threads 128 merged.vg > merged.gfa vg deconstruct -P TW_t2 -H "#" -e -a -t 128 merged.gfa > merged.vcf

vg version v1.40.0 "Suardi" Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux Linked against libstd++ 20210601 Built by stephen@lubuntu

The format of chromosome name is TW_t2#1#chr1, TW_t2#1#chr2 ... TW_t2#1#chr14

ashleethomson commented 1 month ago

I am also trying to combine graphs, but I have full graphs (containing all chromosomes) that I have augmented to contain variants specific to different individuals. Can vg combine be used to merge these graphs to make a single graph that is relative to the reference path that was used?

adamnovak commented 1 month ago

@Lucio-Yang You can try vg paths --list -x merged.vg and vg paths --list -x merged.gfa to see what paths are in the graphs. Sometimes converting paths to/from GFA can hit bugs in how we represent path names, especially on such an old build of vg.

I would recommend upgrading to a more recent release of vg, and also maybe adding an RS tag to your GFA to indicate which sample is the reference you want to use.

I know @glennhickey is revising deconstruct; I'm not sure whether it will help with your particular problem.

@ashleethomson unfortunately vg combine can't weld multiple graphs together along a shared set of linear reference paths. I don't believe we have a tool in vg that can do that, but that sort of graph welding might be exposed in https://github.com/ComparativeGenomicsToolkit/cactus ? Especially if you take all your graphs back to MAF or PAF? It's definitely possible using the https://github.com/ComparativeGenomicsToolkit/pinchesAndCacti library and some walking of paths, but I don't know if there's a tool that can do it yet.