Closed santiago1234 closed 2 years ago
I talked with Aaron last Friday and the logic for this is correct.
Aaron just told me to make sure I look in the correct DNA strand.
Here is a hack, I can use VEP to annotate all coding SNPs. Let VEP do it for me.
~/ensembl-vep/vep -i exon-snps.txt.gz --cache \
--assembly GRCh38 --tab --output_file variants.txt.gz \
--compress_output gzip --fields \
"Uploaded_variation,Location,Allele,Gene,Feature_type,Consequence,Codons"
NOTE: VEP repeats lines. In the pipeline, I should subset unique lines, for example:
grep 'missense' variant_effect_output.txt |sort|uniq
I will put this in a pipeline, to see the results.
First, I need to obtain the coding genome (sequences) from GRCh38.
After that:
For each focal SNP, we have three contributions to the mutation rates, for example, if all the mutations are synonymous the contribution of that focal SNP to the missense rate will be zero.