vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.08k stars 192 forks source link

vg call get empty vcf output #3843

Closed minglibio closed 1 year ago

minglibio commented 1 year ago

1. What were you trying to do?

I am trying to genotype short reads resequencing samples using vg call

2. What did you want to happen?

Get a VCF file including sample-specific variants.

3. What actually happened?

There is no error info, but I get a VCF file only including the header line.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?

The pangenome graph was constructed with Cactus pangenome pipeline (cactus v2.4.0). And I checked the coverage of the pack file using vg depth, it looks like the coverage is normal.

vg giraffe -t 4 -Z Graph.d2.gbz -m Graph.d2.min -d Graph.d2.dist -N Sample -i -f - > Sample.gam
vg pack -t 8 -x Graph.d2.gbz -g Sample.gam -Q 10 -s 5 -o Sample.pack
vg call -t 8 Graph.d2.gbz -k Sample.pack -s Sample -a > Sample.vcf

6. What does running vg version say?

vg version v1.44.0 "Solara"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by anovak@octagon
glennhickey commented 1 year ago

I think you need to start by double-checking your gam: vg stats Graph.d2.gbz -a Sample.gam

Then you can check the pack file with vg depth Graph.d2.gbz -k Sample.pack -b 100000

If that's empty, maybe verify your graph has a reference path(s) vg paths -v Graph.d2.gbz -M | grep REFERENCE

You can also force vg call to output variants with -a. I'd expect that to give you a full VCF but iwth 0/0 genotypes everywhere due to lack of coverage (if it doesn't, that means there's something really wrong).

minglibio commented 1 year ago

I checked the gam by running vg stats Graph.d2.gbz -a Sample.gam, here is the output:

Total alignments: 278834442
Total primary: 278834442
Total secondary: 0
Total aligned: 274707714
Total perfect: 178512470
Total gapless (softclips allowed): 261052403
Total paired: 278834442
Total properly paired: 274167738
Insertions: 21997194 bp in 7196186 read events
Deletions: 32821781 bp in 11125127 read events
Substitutions: 213863262 bp in 213863262 read events
Softclips: 800240090 bp in 16884937 read events
Unvisited nodes: 291514/11175191 (2299377 bp)
Single-visited nodes: 143540/11175191 (915419 bp)
Significantly biased heterozygous sites: 0/0

And the pack file by running vg depth Graph.d2.gbz -k Sample.pack -b 100000:

REF#0#1     1       100497  53.9062 113.738
REF#0#1     100497  200674  40.4937 12.5534
REF#0#1     200674  300888  40.7004 11.5984
REF#0#1     300888  401224  37.2648 12.1545
REF#0#1     401224  501236  38.9692 9.63833
REF#0#1     501236  601295  38.3521 11.6965
REF#0#1     601295  701391  38.1413 11.1873
REF#0#1     701391  801550  39.5247 9.99452
REF#0#1     801550  901760  40.0647 9.86583
...

And it seems the graph has the reference path (vg paths -v Graph.d2.gbz -M | grep REFERENCE):

REF#0#1     REFERENCE       REF 0       1       NO_PHASE_BLOCK  NO_SUBRANGE
REF#0#2     REFERENCE       REF 0       2       NO_PHASE_BLOCK  NO_SUBRANGE
REF#0#3     REFERENCE       REF 0       3       NO_PHASE_BLOCK  NO_SUBRANGE
REF#0#4     REFERENCE       REF 0       4       NO_PHASE_BLOCK  NO_SUBRANGE
REF#0#5     REFERENCE       REF 0       5       NO_PHASE_BLOCK  NO_SUBRANGE
...

They all look fine to me. And I have added the -a option when running vg call. So I really don't know what's going wrong.

glennhickey commented 1 year ago

Yeah, that all seems fine. Are you able to share Graph.d2.gbz and Sample.pack? If so I will look in the debugger, otherwise I'm not too sure what to suggest...

minglibio commented 1 year ago

Sorry for the late reply. This issue was resolved by updating to the newest cactus v2.4.3.