jaamarks commented 6 years ago

The aim of this fwGWAS method is increase to GWAS power by weighting sequence variants based on their functional annotation, and therefore discover novel associations that were not identified with the traditional approach. The field standard approach to GWAS has been to apply an equal probability of association to all variants being tested, which results in the familiar 5e-8 threshold-of-association variants must meet in order to be deemed significant. Using this fwGWAS method, variants are weighted differently according to their functional annotation. In particular, sequence variants are categorized into one of four functional annotation groups. Each group has a distinct threshold of association—some more liberal while others more stringent—based on the grounds that some variants are more likely than others to affect function and therefore be causal. More specifically, we group variants into the four groups along with their associated thresholds of correlation: Loss of Function (5.5e-7), Moderate Impact (1.1e-7), Low Impact (1.0e-8), and Other (1.7e-9). We applied this fwGWAS method to three sets of published GWAS results—SCZ1, MDD2013, FTND2—and analyze its potential utility. One novel sequence variant was identified in the FTND2 data set, however no new loci were identified; the sequence variant identified was in the well know locus PSMA4 (near CHRNA5). This approach is based off of the 2016 Nature Genetics paper by Sveinbjornsson et al (deCODE).

jaamarks commented 6 years ago

According to the SnpEff Manual a variant can have (and usually has) more than one annotation. The vcf annotation manual has more information about the annotation behavior.

There are several reasons for this:

A variant can affect multiple genes. E.g a variant can be DOWNSTREAM from one gene and UPSTREAM from another gene.

In complex organisms, genes usually have multiple transcripts. So SnpEff reports the effect of a variant on each transcript.

#CHROM  POS     ID   REF  ALT  QUAL FILTER   INFO
1       889455  .    G    A    .    .        ANN=A|stop_gained|HIGH|NOC2L|ENSG00000188976|transcript|ENST00000327044|protein_coding|7/19|c.706C>T|p.Gln236*|756/2790|706/2250|236/749||
                                            ,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000487214|processed_transcript||n.*865C>T|||||351|
                                            ,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000469563|retained_intron||n.*878C>T|||||4171|
                                            ,A|non_coding_exon_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000477976|retained_intron|5/17|n.2153C>T||||||;LOF=(NOC2L|ENSG00000188976|6|0.17);NMD=(NOC2L|ENSG00000188976|6|0.17)

A VCF line can have more then one variant. E.g. If reference genome is 'G', but the sample has either 'A' or 'T' (non-biallelic variant), then this will be reported as one VCF line, having multiple alternative variants

#CHROM  POS      ID    REF  ALT    QUAL FILTER    INFO
1       889455   .     G    A,T    .    .         ANN=A|stop_gained|HIGH|NOC2L|ENSG00000188976|transcript|ENST00000327044|protein_coding|7/19|c.706C>T|p.Gln236*|756/2790|706/2250|236/749||
                                                 ,T|missense_variant|MODERATE|NOC2L|ENSG00000188976|transcript|ENST00000327044|protein_coding|7/19|c.706C>A|p.Gln236Lys|756/2790|706/2250|236/749||
                                                 ,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000487214|processed_transcript||n.*865C>T|||||351|
                                                 ,T|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000487214|processed_transcript||n.*865C>A|||||351|
                                                 ,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000469563|retained_intron||n.*878C>T|||||4171|
                                                 ,T|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000469563|retained_intron||n.*878C>A|||||4171|
                                                 ,A|non_coding_exon_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000477976|retained_intron|5/17|n.2153C>T||||||
                                                 ,T|non_coding_exon_variant|MODIFIER|NOC2L|ENSG00000188976|transcript|ENST00000477976|retained_intron|5/17|n.2153C>A||||||;LOF=(NOC2L|ENSG00000188976|6|0.17);NMD=(NOC2L|ENSG00000188976|6|0.17)

Effect sort order. When multiple effects are reported, SnpEff sorts the effects the following way:

Putative impact: Effects having higher putative impact are first.
Effect type: Effects assumed to be more deleterious effects first.
Canonical trancript before non-canonical.
Marker genomic coordinates (e.g. genes starting before first).

With this in mind, how should we handle these variants?

Answer

This is not an issue. I will categorize each variant and then compare it to its fwThreshold. If a variant is listed multiple times because of multiple annotations - for reasons listed above - then just filter to uniqueness in the end. If a variant is significant in any way is really what we are looking for.

jaamarks commented 6 years ago

Results Presentation

fwGWAS- Naive Method

SCZ2

fwGWAS – Naïve Method.pptx

jaamarks commented 5 years ago

results presentation

SCZ1, MDD1, FTND2
1-paragraph description of method & findings from applying it to the three data sets above

20190110-heroin-fwGWAS-naive-method-scz1+mdd1+ftnd2.pptx

jaamarks commented 5 years ago

I applied fwGWAS to FTND3 and appended the results to the end of the PowerPoint.

20190114-heroin-fwGWAS-naive-method-scz1+mdd1+ftnd2+ftnd3.pptx

jaamarks commented 5 years ago

The latest slide deck is located on Eric's share at: \\RTPNFIL02\eojohnson\Heroin\Heroin fwGWAS\Summary Results Slides

jaamarks / jaamarks_notebooks

fwGWAS—Naive Approach #1

With this in mind, how should we handle these variants?

Answer

Results Presentation

fwGWAS- Naive Method

SCZ2