vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.08k stars 192 forks source link

Can I use multiple gam files in parallel for augment graphs? #4032

Open Wwwwwwwyc opened 11 months ago

Wwwwwwwyc commented 11 months ago

Hi! We built a graph using 30 samples and mapping the other 1000 samples to the graph using VG Giraffe. I think VG Augment could add some of the variation from these 1,000 samples to the graph. However, performing vg augment sequentially on each sample seems to generate 1000 augment graph, could the current version of vg supplement multiple gam files to the graph in parallel?

glennhickey commented 11 months ago

No. But you can combine your gams with cat aln1.gam aln2.gam aln3.gam (etc) > combined.gam and pass the combined gam to augment.

Wwwwwwwyc commented 11 months ago

Thanks for your reply! This is helpful

Wwwwwwwyc commented 11 months ago

There is one more small issue to bother you. When I add a sample, the size of the vg graph grows from 100M to 800M size, is this normal?

glennhickey commented 11 months ago

It could be normal if your sample is different enough from the graph.

Also keep in mind that any read errors will get added to the graph as well. You can mitigate this with the -c option where variants only get added if they have a certain read depth in the GAM.

So if you have 30X coverage, you might consider something like -c 5 or so in order to filter out errors.

But if you concatenate your GAMs, then this option may be trickier to use. What is the overall goal of your project?

Wwwwwwwyc commented 11 months ago

The genome of the species we studied was around 12M in size and consisted of 5 clades. To construct the initial graph, we used chromosome assemblies from samples representing all 5 clades. We estimated the growth of the pangenome. The second genome added around 0.65 Mb of sequence to the initial graph, whereas the last genome tended to add only about 0.08 Mb. This result has led us to wonder whether such a substantial expansion of the vg graph when adding new samples is abnormal or not?

We are planning to conduct another analysis using the -c option to see if there might be any changes. However, as you rightly mentioned, different samples may have varying sequencing depths, and this is an aspect we need to carefully consider.

Our goal is to generate an augment graph using NGS reads from the other samples onto the graph, which we think would significantly enhance the integrity of the map.

Thank you for your help!

Donandrade commented 2 months ago

Hello @Wwwwwwwyc !

Could you inform me about your progress on this issue? I have the same question. So, using vg augment, were you able to enhance the integrity of your pangenome?

All the best,