Closed pvanheus closed 4 years ago
What is "gene name" ? you mean recA
etc? That is not unique.
What is the exact snpEff
command line you used to go from GENBANK + VCF to the annotated VCF?
( i never got it working)
I have had no end of pain with snpEff
and am half way to migrating to bcftools csq
Yes they are not unique. Uniqueness is not required in the ANN field.
mkdir ref
gzip -c Mycobacterium_tuberculosis_ancestral_reference.gb > ref/genes.gbk.gz
snpEff build -c snpeff.config -dataDir . -genbank ref
snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -c snpeff.config -dataDir . ref test.vcf >test-annotated.vcf
Also having this issue where snpEff craps out after running below command when using GebBank Reference. Can't seem to figure why. This also happens to be a mycobacterial reference genome. It runs completely fine when using the fasta of the same reference genome.
snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -t -c reference/snpeff.config -dataDir . ref snps.filt.vcf > snps.vcf 2>> snps.log
Nevermind. Just ran it on E. coli with K12 Genbak for ref and snpEff still crapped out
I also have snpEff caking its' trousers when using a genbank file.
@biobrad @cocathail @pvanheus what version of snpeff are you using? and what java version?
$ snpEff -version
SnpEff 4.3t 2017-11-24
$ java -version
openjdk version "1.8.0_212-ojdkbuild"
snpEff is the same version
java version "9.0.4" Java(TM) SE Runtime Environment (build 9.0.4+11) Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)
I've had issues with Java. Never worked on versions > 1.8 for me. Then they changed the numbering system. I think the conda version has java pinned at 8 (1.8).
managed to force back to java 1.8. This has indeed fixed the issues!
When using a Genbank reference (this one), the ANN fields provided by snpEff have details from the ID field in the GFF3 that snpEff creates.
I.e. while
snpEff
can handle a Genbank reference directlysnippy
creates a GFF3 reference with the ID as the locus tag. IfsnpEff
is called with the Genbank file it generates a database with gene names not locus tags.This is rambling, but since the gene names are what I want is there a way to get then by either doing something with the GFF3 or using the Genbank file directly?