greenelab / pdx_exomeseq

Pipeline analysis for whole exome sequencing of pancreatic cancer PDX models
MIT License
21 stars 14 forks source link

Annotate Variants #22

Closed gwaybio closed 6 years ago

gwaybio commented 6 years ago

STEP 6 is to Annotate Variants

Here is the preliminary pipeline:

First, download ANNOVAR and associated databases Instructions here: http://annovar.openbioinformatics.org/en/latest/user-guide/startup/ All documented databases here: https://github.com/WGLab/doc-ANNOVAR/blob/master/user-guide/download.md

The databases we will use are: refGene,cosmic70,gnomad_exome,dbnsfp30a

Convert MuTect2 derived VCF files to annovar compatible files perl convert2annovar.pl -includeinfo -format vcf4 ../../example/KS30_S4_L004_001.fq.gz.sam_sorted.bam_sorted_fixmate.bam_positionsort.bam.bam_rmdup.bam.rg.bam.GATK.vcf > testing_GATK.vcf

Add annotations as columns to converted annovar VCF perl table_annovar.pl testing_GATK_info.vcf humandb/ -buildver hg19 -out testing_output -otherinfo -remove -protocol refGene,cosmic70,gnomad_exome,dbnsfp30a -operation g,f,f,f -nastring . -csvout -polish

gwaybio commented 6 years ago

variants are now annotated.