Closed Dani-kolbe closed 1 year ago
Hello,
I think it'd be helpful if you can share individual gVCF calls for the problematic site before merging (hopefully with a fewer number of samples) so that we can try to reproduce it. As you said we'll have to ask GLnexus maintainers about this issue. I found this issue on the GLnexus repo https://github.com/dnanexus-rnd/GLnexus/issues/286 - I'd recommend adding a reproducible example there as well.
About the PASS filter, if your downstream application explicitly requires having that filter value, I'd recommend using tools like bcftools
to add it. I think bcftools annotate --rename-annots
would work based on this page, using FILTER/. PASS
as the mapping: https://samtools.github.io/bcftools/bcftools.html#annotate
Thank you.
Best, Ted
Hi Ted, thank you for your reply! This is what the site looks like in a couple of gvcf files directly from deepvariant:
1 1318679 . A <*> 0 . END=1319106 GT:GQ:MIN_DP:PL 0/0:1:0:0,0,0
1 1319056 . A G,<*> 30.7 PASS . GT:GQ:DP:AD:VAF:PL 1/1:8:2:0,2,0:1,0:29,6,0,990,990,990
1 1318679 . A <*> 0 . END=1319087 GT:GQ:MIN_DP:PL 0/0:1:0:0,0,0
Thank you for your help! All the best, Daniel
Hi @Dani-kolbe
Just to understand your desired behavior, do you want the final VCF calls for low evidence sites to be uniformly ./., or do you prefer all ./. listed as 0/0?
Hi Andrew, thank you again for your reply. I think ./. is fine - this will be handled as missing data by something like PLINK? And DV will filter out site with overall low confidence, right? Best wishes, Daniel
I'm a bit uncertain about the status of this thread. It also seems like there was another thread asking about filtering.
@Dani-kolbe Is there any questions we can still help with in this thread?
Hi. Everything is fine thanks! closing thread, sorry
Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.5/docs/FAQ.md: Yep
Describe the issue: The issue is actually with GLnexus, however noone is replying there (people with same issues), and since it is recommended here in the deepvariant pipeline, I thought I would ask here.
I have produced a large set of gVCF files using deepvariant and merged these with GLnexus as recommended. However, there seems to be an issue in the final merged VCF file. Many genotypes are called as 0/0 when they have very low or zero DP: e.g.
1 1319056 1_1319056_A_G A G 51 . AF=0.32848;AQ=51 GT:DP:AD:GQ:PL:RNC 0/0:0:0,0:1:0,0,0:.. 1/1:2:0,2:3:29,6,0:.. 0/0:0:0,0:1:0,0,0:.. ./.:1:1,0:0:29,3,0:II ./.:1:1,0:0:29,3,0:II ./.:1:1,0:0:29,3,0:II ./.:1:1,0:0:29,3,0:II 0/1:2:0,2:1:12,2,0:.. 0/0:0:0,0:1:0,3,29:.. 0/0:0:0,0:1:0,0,0:.. 0/0:0:0,0:1:0,0,0:.. ./.:1:1,0:0:29,3,0:II ./.:1:1,0:0:29,3,0:II 0/0:0:0,0:1:0,0,0:.. 0/0:0:0,0:1:0,0,0:.. ./.:1:1,0:0:29,3,0:II ./.:1:1,0:0:29,3,0:II 0/0:0:0,0:1:0,0,0:.. 0/0:0:0,0:1:0,0,0:.. ./.:1:1,0:0:29,3,0:II 0/0:0:0,0:1:0,0,0:.. 0/0:0:0,0:1:0,0,0:.. ./.:1:1,0:0:29,3,0:II 0/0:0:0,0:1:0,0,0:.. 0/0:0:0,0:1:0,0,0:.. ./.:1:1,0:0:29,3,0:II 0/0:0:0,0:1:0,0,0:.. ./.:1:1,0:0:29,3,0:II 0/0:0:0,0:1:0,0,0:.. 1/1:5:0,5:9:40,12,0:.. 1/1:3:0,3:7:36,10,0:.. 0/0:0:0,0:1:0,3,29:.
This is messing with downstream analysis, and overall just looks like poor QC. Additionally, the annotation/filter field is missing. In the gVCFs there was still a "PASS" label. This is also required for downstream analysis. So I am wondering where I went wrong, or whether there is a more suitable software to merge gVCFs. Thank you!
Setup
Steps to reproduce:
Command:
Error trace: no errors
This is the vcf header: