macarthur-lab / gnomad_browser

gnomAD browser pre-ASHG 2018
MIT License
33 stars 16 forks source link

browser missing a variant that is in the vcf file #61

Closed shanyang88 closed 5 years ago

shanyang88 commented 7 years ago

I noticed that in the exome file @13:48878114, there are two variants. One being a deletion of CCG and the other being an duplication of CCG. 13 48878114 rs775630214 ACCG A,ACCGCCG 30449.72 PASS AC=32,1;AF=3.36821e-04,1.05257e-05;AN=95006;BaseQRankSum=-6.79000e-01;....

However, in the browser, around this region, there are only data for the duplication, not the deletion (which has a much higher MAF, AC=32). Near this variant, there is another inframe deletion of 3 bases, resulting in the same AA deletion of Pro. But clearly, these two deletions are not the same from a DNA level and the MAF of the two are very different. See attached SS for details.

Can anyone explain this? Thanks!

screen shot 2017-09-25 at 2 37 48 pm
shanyang88 commented 7 years ago

I re-visited this turning the "Filtered (non-PASS) variants" filter off and did see this variant. However, it is shown as having "RF" on instead of "PASS" while it is "PASS" in the exome vcf file. So although this variant is not missing, there is still inconsistency between what's in the vcf file and what is shown on browser.

MartinPersida commented 6 years ago

Hello,

I see this issue is not that recent but I ran into a similar problem when parsing the VCF file (using pyVCF) and filtering out variants which do not pass the filter.

In the case of multiallelic variants, I am not sure how the FILTER value is set. In the following case:

X 152074257 rs563423631 TG TGG,T,TGGG,TCG

3 variants (TGG,T,TCG) passed the filter and are displayed in gnomAD browser 1 variant (TGGG) filtered out on gnomAD browser as having failed RF filter, however it is consider as PASS in the VCF as only a single valur is given for FILTER field.

Is there a way to get the exact FILTER value for each allele of mutiallelic variants (as comma separated value like the frequencies) ?

Thanks

MartinPersida commented 6 years ago

To answer my comment I actually didn't notice the AS_FilterStatus in the INFO filed giving the filter for each allele of a given locus. so that is actually solving my comment.

nawatts commented 5 years ago

However, it is shown as having "RF" on instead of "PASS" while it is "PASS" in the exome vcf file. So although this variant is not missing, there is still inconsistency between what's in the vcf file and what is shown on browser.

As @MartinPersida mentioned, this information was contained in the AS_FilterStatus info field.

As mentioned in the Variant QC section, we have fundamentally switched from a site-level filtering strategy to an allele-specific filtering strategy. For this reason, we now have a field named AS_FilterStatus in the INFO column that specifies the filtering status for each allele separately. If any allele is PASS, then the entire site is PASS (unless it is in a LCR or SEGDUP region, or fails the InbreedingCoeff filter).

-- https://macarthurlab.org/2017/02/27/the-genome-aggregation-database-gnomad/

In the current release of gnomAD, multi-allelic sites have been split.

For this release, all multi-allelic sites have been split. This means that multiple lines now have the same chromosome and position. This decision was made since the vast majority (all?) of gnomAD downstream users did not want to have multiple alleles on a single line.

-- https://macarthurlab.org/2018/10/17/gnomad-v2-1/