Closed jaamarks closed 5 years ago
According to the SnpEff Manual a variant can have (and usually has) more than one annotation. The vcf annotation manual has more information about the annotation behavior.
There are several reasons for this:
A variant can affect multiple genes. E.g a variant can be DOWNSTREAM from one gene and UPSTREAM from another gene.
In complex organisms, genes usually have multiple transcripts. So SnpEff reports the effect of a variant on each transcript.
#CHROM POS ID REF ALT QUAL FILTER INFO
1 889455 . G A . . ANN=A|stop_gained|HIGH|NOC2L|ENSG00000188976|transcript|ENST00000327044|protein_coding|7/19|c.706C>T|p.Gln236*|756/2790|706/2250|236/749||
,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000487214|processed_transcript||n.*865C>T|||||351|
,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000469563|retained_intron||n.*878C>T|||||4171|
,A|non_coding_exon_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000477976|retained_intron|5/17|n.2153C>T||||||;LOF=(NOC2L|ENSG00000188976|6|0.17);NMD=(NOC2L|ENSG00000188976|6|0.17)
A VCF line can have more then one variant. E.g. If reference genome is 'G', but the sample has either 'A' or 'T' (non-biallelic variant), then this will be reported as one VCF line, having multiple alternative variants
#CHROM POS ID REF ALT QUAL FILTER INFO
1 889455 . G A,T . . ANN=A|stop_gained|HIGH|NOC2L|ENSG00000188976|transcript|ENST00000327044|protein_coding|7/19|c.706C>T|p.Gln236*|756/2790|706/2250|236/749||
,T|missense_variant|MODERATE|NOC2L|ENSG00000188976|transcript|ENST00000327044|protein_coding|7/19|c.706C>A|p.Gln236Lys|756/2790|706/2250|236/749||
,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000487214|processed_transcript||n.*865C>T|||||351|
,T|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000487214|processed_transcript||n.*865C>A|||||351|
,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000469563|retained_intron||n.*878C>T|||||4171|
,T|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000469563|retained_intron||n.*878C>A|||||4171|
,A|non_coding_exon_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000477976|retained_intron|5/17|n.2153C>T||||||
,T|non_coding_exon_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000477976|retained_intron|5/17|n.2153C>A||||||;LOF=(NOC2L|ENSG00000188976|6|0.17);NMD=(NOC2L|ENSG00000188976|6|0.17)
Effect sort order. When multiple effects are reported, SnpEff sorts the effects the following way:
This is not an issue. I will categorize each variant and then compare it to its fwThreshold. If a variant is listed multiple times because of multiple annotations - for reasons listed above - then just filter to uniqueness in the end. If a variant is significant in any way is really what we are looking for.
results presentation
I applied fwGWAS to FTND3 and appended the results to the end of the PowerPoint.
20190114-heroin-fwGWAS-naive-method-scz1+mdd1+ftnd2+ftnd3.pptx
The latest slide deck is located on Eric's share at:
\\RTPNFIL02\eojohnson\Heroin\Heroin fwGWAS\Summary Results Slides
The aim of this fwGWAS method is increase to GWAS power by weighting sequence variants based on their functional annotation, and therefore discover novel associations that were not identified with the traditional approach. The field standard approach to GWAS has been to apply an equal probability of association to all variants being tested, which results in the familiar 5e-8 threshold-of-association variants must meet in order to be deemed significant. Using this fwGWAS method, variants are weighted differently according to their functional annotation. In particular, sequence variants are categorized into one of four functional annotation groups. Each group has a distinct threshold of association—some more liberal while others more stringent—based on the grounds that some variants are more likely than others to affect function and therefore be causal. More specifically, we group variants into the four groups along with their associated thresholds of correlation: Loss of Function (5.5e-7), Moderate Impact (1.1e-7), Low Impact (1.0e-8), and Other (1.7e-9). We applied this fwGWAS method to three sets of published GWAS results—SCZ1, MDD2013, FTND2—and analyze its potential utility. One novel sequence variant was identified in the FTND2 data set, however no new loci were identified; the sequence variant identified was in the well know locus PSMA4 (near CHRNA5). This approach is based off of the 2016 Nature Genetics paper by Sveinbjornsson et al (deCODE).