tseemann / snippy

:scissors: :zap: Rapid haploid variant calling and core genome alignment
GNU General Public License v2.0
467 stars 113 forks source link

snpEFF ANN gene names with Genbank reference #332

Closed pvanheus closed 4 years ago

pvanheus commented 4 years ago

When using a Genbank reference (this one), the ANN fields provided by snpEff have details from the ID field in the GFF3 that snpEff creates.

I.e. while snpEff can handle a Genbank reference directly snippy creates a GFF3 reference with the ID as the locus tag. If snpEff is called with the Genbank file it generates a database with gene names not locus tags.

This is rambling, but since the gene names are what I want is there a way to get then by either doing something with the GFF3 or using the Genbank file directly?

tseemann commented 4 years ago

What is "gene name" ? you mean recA etc? That is not unique.

What is the exact snpEff command line you used to go from GENBANK + VCF to the annotated VCF? ( i never got it working)

I have had no end of pain with snpEff and am half way to migrating to bcftools csq

pvanheus commented 4 years ago

Yes they are not unique. Uniqueness is not required in the ANN field.

mkdir ref
gzip -c Mycobacterium_tuberculosis_ancestral_reference.gb > ref/genes.gbk.gz
snpEff build -c snpeff.config -dataDir . -genbank ref
snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -c snpeff.config -dataDir . ref test.vcf >test-annotated.vcf

test.vcf test-annotated.vcf

cocathail commented 4 years ago

Also having this issue where snpEff craps out after running below command when using GebBank Reference. Can't seem to figure why. This also happens to be a mycobacterial reference genome. It runs completely fine when using the fasta of the same reference genome.

snpEff ann -noLog -noStats -no-downstream -no-upstream -no-utr -t -c reference/snpeff.config -dataDir . ref snps.filt.vcf > snps.vcf 2>> snps.log

cocathail commented 4 years ago

Nevermind. Just ran it on E. coli with K12 Genbak for ref and snpEff still crapped out

biobrad commented 4 years ago

I also have snpEff caking its' trousers when using a genbank file.

tseemann commented 4 years ago

@biobrad @cocathail @pvanheus what version of snpeff are you using? and what java version?

 $ snpEff -version
SnpEff  4.3t    2017-11-24

 $ java -version
openjdk version "1.8.0_212-ojdkbuild"
cocathail commented 4 years ago

snpEff is the same version

java version "9.0.4" Java(TM) SE Runtime Environment (build 9.0.4+11) Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

tseemann commented 4 years ago

I've had issues with Java. Never worked on versions > 1.8 for me. Then they changed the numbering system. I think the conda version has java pinned at 8 (1.8).

cocathail commented 4 years ago

managed to force back to java 1.8. This has indeed fixed the issues!