vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg call produced an empty VCF file #3471

Open SimonaSecomandi opened 2 years ago

SimonaSecomandi commented 2 years ago

1. What were you trying to do?

Call variants from read support.

2. What did you want to happen?

Generate a VCF file.

3. What actually happened?

The command have been running for 5 days but the VCF is empty.

5. What data and command can the vg dev team use to make the problem happen?

I have a pangenome graph constructed with Cactus pangenome pipeline. It contains 10 bird assemblies (around 1 Gbp each) for the same species.

Here's the call command: vg call -t 32 WGS_aug.xg -k WGS_aug.pack > WGS_calls.vcf

Here's the commands I used before this:

vg mod -t 32 -X 256 pangenome.vg > pangenome_chopped.vg
vg index -t 32 -x pangenome_chopped.xg pangenome_chopped.vg
vg prune -t 32 -k 45 pangenome_chopped.vg > pangenome_chopped_pruned.vg
vg index -t 32 -b /tmp -p -g pangenome_chopped_pruned.gcsa pangenome_chopped_pruned.vg
vg map -t 32 -f WGS_forward.fastq.gz -f WGS_reverse.fastq.gz -x pangenome_chopped.xg -g pangenome_chopped_pruned.gcsa > WGS_aln.gam
vg augment pangenome_chopped.vg WGS_aln.gam -A WGS_aug.gam > WGS_aug.vg
vg index -t 32 -b /tmp WGS_aug.vg -x WGS_aug.xg
vg pack -x WGS_aug.xg -g WGS_aug.gam -o WGS_aug.pack

6. What does running vg version say?

vg version v1.30.0 "Carentino"
Compiled with g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 on Linux
Linked against libstd++ 20200808
Built by anovak@octagon
glennhickey commented 2 years ago

It's tough to say what's going on from here. You can check to make sure you pack file has coverage using vg depth -k. Otherwise if you are able to share your input I can take a look.

vg call has been running too slow on some graphs coming out of cactus and pggb. This can happen in the presence of really large sites. One simple thing that would speed it up would be an option to just bypass such sites (and only call nested variations inside them).

ekg commented 2 years ago

Glenn, I agree that skipping the big sites is a good idea. Alternatively you might want to only genotype the big sites. Maybe a size range limit parameter would support that?

On Thu, Nov 4, 2021, 15:16 Glenn Hickey @.***> wrote:

It's tough to say what's going on from here. You can check to make sure you pack file has coverage using vg depth -k. Otherwise if you are able to share your input I can take a look.

vg call has been running too slow on some graphs coming out of cactus and pggb. This can happen in the presence of really large sites. One simple thing that would speed it up would be an option to just bypass such sites (and only call nested variations inside them).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/3471#issuecomment-961027828, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEKASIEUTJHR3A77VXLUKKPVPANCNFSM5HEY47DA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

SimonaSecomandi commented 2 years ago

Dear all, I can't share my graph, the augmented vg praph is 80 Gbytes..

I'm checking the coverage with vg depth -k WGS_aug.pack WGS_aug.vg

Meanwhile, the calling is still going, but the VCF is still empty

SimonaSecomandi commented 2 years ago

Hi all, I saw that you implemented an option to avoid calling large snarls (#3490):

Options -c and -C added to vg call to restrict snarl calling to sites within specified size range

I'm interested in calling SNPs, so what size sould I set?

Many thanks!

glennhickey commented 2 years ago

Maybe something like -C 100 ? That should run through rather quickly at least.

SimonaSecomandi commented 2 years ago

Perfect, thank you! I will try it right away and I get back to you as soon as it finishes!!