Closed mmterpstra closed 8 years ago
This effects the following annotations: 1000gPhase1Indels.AF 1000gPhase1Indels.AFR_AF 1000gPhase1Indels.AMR_AF 1000gPhase1Indels.ASN_AF 1000gPhase1Indels.EUR_AF 1000gPhase1Snps.AF these are redundant annotations and can be filtered using dbsnfp fields: dbNSFP_1000Gp1_AF dbNSFP_1000Gp1_AFR_AC dbNSFP_1000Gp1_AFR_AF dbNSFP_1000Gp1_AMR_AF dbNSFP_1000Gp1_ASN_AF dbNSFP_1000Gp1_EUR_AF dbNSFP_ARIC5606_AA_AF
Also please note that the automatic filtering is not applied for these SNPs >> Do it manually!
This is a partial Bug because the current filtering still filters for the _Common high confidence Snps_ in the 1000 genomes project. The correct solution is to use the ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz from the 1kg website.
todo:
update:
Testing shows that using different _files/samples_ and the new filtering criteria have a large effect. Now use same samples for validation.
effect(also filtered by other filters):
total filtered variants | count |
---|---|
1000gEURMAFgt0.02 | 0 |
1000gMAFgt0.02 | 80000 |
additive effect(not filtered by other filter):
filtered variants | count |
---|---|
1000gEURMAFgt0.02 | 0 |
1000gMAFgt0.02,1000gEURMAFgt0.02 | 0 |
1000gMAFgt0.02 | ~300 |
Effect(also filtered by other filters ):
total filtered variants | count |
---|---|
1000gEURMAFgt0.02 | ~112000 |
1000gMAFgt0.02 | ~115000 |
Additive effect(not filtered by other filter):
filtered variants | count |
---|---|
1000gEURMAFgt0.02 | ~200 |
1000gMAFgt0.02,1000gEURMAFgt0.02 | ~4000 |
1000gMAFgt0.02 | ~200 |
Test shows it works although needs better comparison.
fixed in GatkSnpEffVariantAnnotation.sh at commit c773177f9c3545a3c89280eb55754b4f16eb4e1a
The phase 1 snv annotations refer to a file without AF field. Resulting in missing AF fields.