mmterpstra / molgenis-c5-TumorNormal

GNU General Public License v2.0
0 stars 1 forks source link

Missing 1000g annotations Phase1 / Phase 3 annotations #11

Closed mmterpstra closed 8 years ago

mmterpstra commented 8 years ago

The phase 1 snv annotations refer to a file without AF field. Resulting in missing AF fields.

mmterpstra commented 8 years ago

This effects the following annotations: 1000gPhase1Indels.AF 1000gPhase1Indels.AFR_AF 1000gPhase1Indels.AMR_AF 1000gPhase1Indels.ASN_AF 1000gPhase1Indels.EUR_AF 1000gPhase1Snps.AF these are redundant annotations and can be filtered using dbsnfp fields: dbNSFP_1000Gp1_AF dbNSFP_1000Gp1_AFR_AC dbNSFP_1000Gp1_AFR_AF dbNSFP_1000Gp1_AMR_AF dbNSFP_1000Gp1_ASN_AF dbNSFP_1000Gp1_EUR_AF dbNSFP_ARIC5606_AA_AF

Also please note that the automatic filtering is not applied for these SNPs >> Do it manually!

mmterpstra commented 8 years ago

This is a partial Bug because the current filtering still filters for the _Common high confidence Snps_ in the 1000 genomes project. The correct solution is to use the ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz from the 1kg website.

mmterpstra commented 8 years ago

todo:

  1. Insert correct VCF in protocols/VariantAnntation.
  2. Check annotated Vcf
  3. Check Filtering/Filtered Vcf
mmterpstra commented 8 years ago

update:

  1. done
  2. in progress
mmterpstra commented 8 years ago

Testing shows that using different _files/samples_ and the new filtering criteria have a large effect. Now use same samples for validation.

Old situation

effect(also filtered by other filters):

total filtered variants count
1000gEURMAFgt0.02 0
1000gMAFgt0.02 80000

additive effect(not filtered by other filter):

filtered variants count
1000gEURMAFgt0.02 0
1000gMAFgt0.02,1000gEURMAFgt0.02 0
1000gMAFgt0.02 ~300

New situation

Effect(also filtered by other filters ):

total filtered variants count
1000gEURMAFgt0.02 ~112000
1000gMAFgt0.02 ~115000

Additive effect(not filtered by other filter):

filtered variants count
1000gEURMAFgt0.02 ~200
1000gMAFgt0.02,1000gEURMAFgt0.02 ~4000
1000gMAFgt0.02 ~200
mmterpstra commented 8 years ago

Test shows it works although needs better comparison.

mmterpstra commented 8 years ago

fixed in GatkSnpEffVariantAnnotation.sh at commit c773177f9c3545a3c89280eb55754b4f16eb4e1a