Identify coverage of ref/alt alleles from the vcf call AD statistic - and looking at the impact of the vg filter/call step.
3. What actually happened?
Running vg filter on the gam reduced the size of the gam substantially. In most cases the AD for each site in $gam.pack.vcf vs. $filter.gam.pack.vcf made sense: the AD was higher for the calls from the unfiltered gam. However there a minority of sites (~2.5%) of cases where the AD for either the reference or alt allele is higher in the filtered gam than the same site in the unfiltered gam (which substantially alters the estimated ref/alt allele coverage - see example graph
, blue points are where the ref or alt allele coverage is higher from the filtered vcf, red are where it is lower) - I don't understand how this can be the case, surely the filtering should only be removing reads? The graph is made from a biallelic vcf so it shouldn't be mutli-allele mapping that is the problem.
5. What data and command can the vg dev team use to make the problem happen?
I can send the data and exact script I used if needed. Any thoughts on this much appreciated!
6. What does running vg version say?
vg version v1.48.0 "Gallipoli"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601```
Yeah, that doesn't seem to make much sense. Are you able to share the gam and gbz, along with a site that's identical in the vcf but has higher AD in the filtered one? Thanks.
1. What were you trying to do?
2. What did you want to happen?
Identify coverage of ref/alt alleles from the vcf call AD statistic - and looking at the impact of the vg filter/call step.
3. What actually happened?
Running vg filter on the gam reduced the size of the gam substantially. In most cases the AD for each site in $gam.pack.vcf vs. $filter.gam.pack.vcf made sense: the AD was higher for the calls from the unfiltered gam. However there a minority of sites (~2.5%) of cases where the AD for either the reference or alt allele is higher in the filtered gam than the same site in the unfiltered gam (which substantially alters the estimated ref/alt allele coverage - see example graph , blue points are where the ref or alt allele coverage is higher from the filtered vcf, red are where it is lower) - I don't understand how this can be the case, surely the filtering should only be removing reads? The graph is made from a biallelic vcf so it shouldn't be mutli-allele mapping that is the problem.
5. What data and command can the vg dev team use to make the problem happen? I can send the data and exact script I used if needed. Any thoughts on this much appreciated!
6. What does running
vg version
say?vg version v1.48.0 "Gallipoli" Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux Linked against libstd++ 20210601```