tgen / CovGen

Creates a target specific exome_full192.coverage.txt file required by MutSig
MIT License
21 stars 9 forks source link

snpEff Variant_Classification's do not exist in the mutation type dictionary #5

Closed sagarutturkar closed 5 years ago

sagarutturkar commented 6 years ago

I am running the CovGen for the Canine data and I plan to use it in MutSig with WGS based somatic mutations.

CovGen Command:

CovGen -o out \ -f genome_ref.fasta \ -g Canis_familiaris.custom.gtf \ #Genes GTF -t Canis_familiaris.genes.bed \ #Genes BED -s $snpEff_root \ -v CanFam3.1.86 \

It generated the output as below:

refGen: genome_ref.fasta bedfile: out_CanFam3.1.86/out_CanFam3.1.86_step3b.bed prefix: out_CanFam3.1.86/out_CanFam3.1.86_step4 cpus: 3 sort: False

2018-06-27 11:19:54,614 [ INFO] - Parsing Reference Genome (assuming only 5 letters possible: A,C,G,T and N) ... 2018-06-27 11:20:17,416 [ INFO] - Parsing Reference Genome (assuming only 5 letters possible: A,C,G,T and N) ... DONE 2018-06-27 11:20:17,417 [ INFO] - Parsing bedfile (assuming no overlapping loci) ... 2018-06-27 11:20:18,107 [ INFO] - Parsing bedfile (assuming no overlapping loci) ... DONE 2018-06-27 11:20:18,107 [ INFO] - Processing Contigs for vcf header ... 2018-06-27 11:20:18,107 [ INFO] - Sorting contigs ... 2018-06-27 11:20:18,107 [ INFO] - Sorting contigs ... DONE 2018-06-27 11:20:18,108 [ INFO] - processing target sequences and capturing the ALTs ... 2018-06-27 11:27:39,370 [ INFO] - processing target sequences and capturing the ALTs ... DONE 2018-06-27 11:27:39,371 [ INFO] - writing vcf files ... 2018-06-27 12:18:09,240 [ INFO] - writing vcf files ... DONE 2018-06-27 12:18:09,240 [ INFO] - Job completed successfully

The following snpEff Variant_Classification's do not exist in the mutation type dictionary:

initiator_codon_variant&splice_region_variant non_coding_transcript_exon_variant splice_region_variant&non_coding_transcript_exon_variant

Please add them to the file with the corresponding effect (null,nonsilent,silent,noncoding)

The last file generated in output directory is out_CanFam3.1.86_step7.txt which enlists various snpEff ANN mutation types. These may be the newer annotations added by SnpEff.

Would it be sufficient to add these annotation in file snpEff_ANN_mutation_type_dictionary_file.txt and rerun the CovGen?

Also, I am not sure if CovGen is using the snpEff_ANN_mutation_type_dictionary_file.txt file by default. I don't see the option to specify this.

awchrist commented 6 years ago

Yes. CovGen uses the snpEff_ANN_mutation_type_dictionary_file.txt that comes with the repo. I have not added an option to use another file. Just add the new Variant_Classification with the what you think the effect would be for each (null,nonsilent,silent or noncoding) and run it again.

You might run into a few other problems down the road when you get to running MutSig.

MutSig itself was designed around exomes and has built in cut off values based on the ratios of coding/noncoding space that will cause all noncoding mutations to get thrown out of the analysis.

To get WGS data to work you would have to curate your own capture space and filter your mutations accordingly as well as use that curated capture space when running CovGen.

I am also unsure how K-9 will work out. In theory it should be fine, Iv'e just never tried it before.