statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
158 stars 65 forks source link

Add VEP annotation #26

Closed pjvandehaar closed 3 years ago

pjvandehaar commented 7 years ago

(extracted from another issue)


Sarah: add an annotation column (for example: intronic; exonic-nonsynonymous; exonic-splice; exonic-synonynmous; intergenic; 5' UTR; 3' UTR)


Peter:

  1. Would you want VEP or snpEff or something else? Is it okay if PheWeb only gives users one choice?

Annotation: I want to be quite careful not to introduce problems into sites.tsv, since matrixify has to parse it. Check version numbers of VEP/snpEff, and then have PheWeb manage everything. If VEP or snpEff can add exactly the annotations I want to a vcf-like file, this should be easy– I just annotate sites.tsv, pass along all columns into augmented_phenos_gz/*.gz and matrix.tsv.gz, convince my API to load and serve the annotations, and set up my default templates to render whatever's available.


Sarah: VEP only is fine.

Annotation: VEP can write into a VCF file (--vcf flag) http://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_vcf You can can filter the VCF output http://useast.ensembl.org/info/docs/tools/vep/script/vep_filter.html


Peter: VEP adds this to the end of INFO:

;CSQ=T|stop_gained|HIGH|CCT8L2|ENSG00000198445|Transcript|ENST00000359963|protein_coding|1/1||||1354|1094|365|W/*|tGg/tAg|||-1||HGNC|15553,

documented by this header:

INFO=

in pairs:

Allele : T Consequence : stop_gained IMPACT : HIGH SYMBOL : CCT8L2 Gene : ENSG00000198445 Feature_type : Transcript Feature : ENST00000359963 BIOTYPE : protein_coding EXON : 1/1 INTRON : HGVSc : HGVSp : cDNA_position : 1354 CDS_position : 1094 Protein_position : 365 Amino_acids : W/* Codons : tGg/tAg Existing_variation : DISTANCE : STRAND : -1 FLAGS : SYMBOL_SOURCE : HGNC HGNC_ID : 15553

From this, I would display consequence: stopgained in tables and maybe use IMPACT: HIGH to color Manhattan/LZ. Most of the rest could get dumped at the bottom of the PheWAS page. If I'm happy with the way SYMBOL and Feature are decided, I could use those instead of my gene annotation.

As I read the VEP annotation from a VCF, I'll take the most deleterious consequence and the highest IMPACT. Then assert that CSQ["alt"] == alt.

Later: Figure out whether we can run ensembl-vep on sites.tsv, or if not what I must add. Maybe add an empty INFO column or rename rsids to ID.


Sarah:

add to tables:

add to tooltips:

pjvandehaar commented 3 years ago

Done.