Closed SafinaAr closed 4 years ago
Closing old issues.
There is Mycobacterium_tuberculosis_h37rv
which is the genome you want. However, the chromosome is called Chromosome
so you'll either need to rename the chromosome in your VCF or fiddle with the snpEff config
Hi, I generated my vcf files from GATK pipeline using ploidy 1 as it is a mycobacterium tuberculosis genome. Now i want to annotate my variants using snpEFF and Annovar. I search snpEff database for mtb annotation using:
java -jar snpEff.jar download -v Mycobacterium_tuberculosis
t gave me numerous results showing that it contans the mtb database. Bit I'm not sure which one is mine/reference one that i used to generate the vcf file. My mtb reference genome file looks like this:
>M.tuberculosis_H37Rv NC_000962.3 ttgaccgatgaccccggttcaggcttcaccacagtgtggaacgcggtcgtctccgaacttaacggcgaccct
I tried buildDbNcbi.sh script from snpEFF to build my own db but it is produced the following error:
Downloading genome NC_000962 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 17.7M 0 17.7M 0 0 157k 0 --:--:-- 0:01:55 --:--:-- 483k 00:00:00 SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani 00:00:00 Command: 'build' 00:00:00 Building database for 'NC_000962' 00:00:00 Reading configuration file 'snpEff.config'. Genome: 'NC_000962' 00:00:00 Reading config file: /home/sark/snpEff/snpEff.config 00:00:01 done No sequence found in feature file. Trying fasta file '/home/sark/snpEff/./data/genomes/NC_000962.fa' Trying fasta file '/home/sark/snpEff/./data/NC_000962/sequences.fa' java.lang.RuntimeException: Cannot find sequence for 'NC_000962' at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.sequence(SnpEffPredictorFactoryFeatures.java:467) at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.addFeatures(SnpEffPredictorFactoryFeatures.java:111) at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.create(SnpEffPredictorFactoryFeatures.java:330) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369) at org.snpeff.SnpEff.run(SnpEff.java:1183) at org.snpeff.SnpEff.main(SnpEff.java:162) java.lang.RuntimeException: Error reading file '/home/sark/snpEff/./data/NC_000962/genes.gbk' java.lang.RuntimeException: Cannot find sequence for 'NC_000962' at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.create(SnpEffPredictorFactoryFeatures.java:344) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369) at org.snpeff.SnpEff.run(SnpEff.java:1183) at org.snpeff.SnpEff.main(SnpEff.java:162) 00:00:01 Logging 00:00:02 Checking for updates... 00:00:04 Done.
Then i kept my fasta file in the above mentioned error folder but now it is giving the following error:
Downloading genome NC_000962.3 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 17.7M 0 17.7M 0 0 332k 0 --:--:-- 0:00:54 --:--:-- 447k curl: (16) Error in the HTTP2 framing layer
Then i thought of using the built in db for MTB so i just renamed my chr names in my file it is: M.tuberculosis_H37Rv And i tried to replace it with the built in one: ERS007734SCcontig000001 Still no success.
It is generating the following error in each variant of the vcf file:
9;ANN=A||MODIFIER|||||||||||||ERROR_OUT_OF_CHROMOSOME_RANGE
Can you please help me with this?
Thank you. :)