tseemann / snippy

:scissors: :zap: Rapid haploid variant calling and core genome alignment
GNU General Public License v2.0
469 stars 113 forks source link

error with gbk #154

Closed aemiol closed 6 years ago

aemiol commented 6 years ago

When I provide a gbk reference input, I get the following error below. This did not occur using a fasta reference. Thanks.

java.lang.RuntimeException: Error parsing property 'ref.AE015924.codonTable'. No such codon table 'Bacterial_and_Plant_Plastid' at org.snpeff.snpEffect.Config.createCodonTables(Config.java:169) at org.snpeff.snpEffect.Config.readConfig(Config.java:650) at org.snpeff.snpEffect.Config.init(Config.java:480) at org.snpeff.snpEffect.Config.(Config.java:117) at org.snpeff.SnpEff.loadConfig(SnpEff.java:451) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:364) at org.snpeff.SnpEff.run(SnpEff.java:1183) at org.snpeff.SnpEff.main(SnpEff.java:162)

tseemann commented 6 years ago

Seems similar to #153 issue.

What version of snpeff -version are you using?

Can you link to , or send me, the Genbank file?

aemiol commented 6 years ago

I installed snippy using conda and currently have snpeff 4.3. Snippy fails irrespective of the genbank file used. It even fails with the example.gbk file

tseemann commented 6 years ago

Snippy will not work with some older genbank files that have lots of RBS features. You have to grep them out first. I assume it works with .fasta files ok?

Slugger70 commented 6 years ago

Galaxy snippy based on the bioconda passes all of it's tests with test gbk file etc. Which has no RBS features.

sbridel commented 6 years ago

well, maybe you should provide more details about making proper GBK file.

My GBK file isn't working too and I have no idea how to clean it.

EDIT.

My GBK file contains 0 RBS features....

But I find something in snpeff documentation:


Building a database
In order to build a database for a new genome, you need to:

Most people do NOT need to build a database, and can safely use a pre-built one. So unless you are working with a rare genome you most likely don't need to do it either. 

or

Configuring a new genome

In order to tell SnpEff that there is a new genome available, you must update SnpEff's configuration file snpEff.config.
You must add a new genome entry to snpEff.config.
If your genome, or a chromosome, uses non-standard codon tables you must update snpEff.config accordingly. A typical case is when you use mitochondrial DNA. Then you specify that chromosome 'MT' uses codon.Invertebrate_Mitochondrial codon table. Another common case is when you are adding a bacterial genome, then you specify that the codon table is Bacterial_and_Plant_Plastid. 
aemiol commented 6 years ago

My genbank file has no RBS features as well, yet getting error with snpEff

tseemann commented 6 years ago

@sbridel yes, i do all those steps within Snippy to allow it to utilise the users reference genome. i suspect somehting has changed with snpeff

@sbridel and @aemiol can you pleas either

  1. paste your snps.log file here, or
  2. at least run run these commands and also tell me your reference file?

% snippy --version
snippy 3.x

% which snpEff
/home/linuxbrew/.linuxbrew/bin/snpEff

% snpEff -version
SnpEff  4.3t    2017-11-24
aemiol commented 6 years ago

cd /projects/emiola/JG/contigs

/home/emiola/anaconda2/bin/snippy --outdir mut1 --ref sequence.gb --ctgs 14190_pginvivalisParent.fasta

samtools faidx reference/ref.fa

bwa index reference/ref.fa

[bwa_index] Pack FASTA... 0.05 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.79 seconds elapse. [bwa_index] Update BWT... 0.01 sec [bwa_index] Pack forward-only FASTA... 0.01 sec [bwa_index] Construct SA from BWT and Occ... 0.29 sec [main] Version: 0.7.17-r1188 [main] CMD: bwa index reference/ref.fa [main] Real time: 1.256 sec; CPU: 1.162 sec

mkdir reference/genomes && cp -f reference/ref.fa reference/genomes/ref.fa

mkdir reference/ref && gzip -c reference/ref.gff > reference/ref/genes.gff.gz

snpEff build -c reference/snpeff.config -dataDir . -gff3 ref

java.lang.RuntimeException: Error parsing property 'ref.AE015924.codonTable'. No such codon table 'Bacterial_and_Plant_Plastid' at org.snpeff.snpEffect.Config.createCodonTables(Config.java:169) at org.snpeff.snpEffect.Config.readConfig(Config.java:650) at org.snpeff.snpEffect.Config.init(Config.java:480) at org.snpeff.snpEffect.Config.(Config.java:117) at org.snpeff.SnpEff.loadConfig(SnpEff.java:451) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:364) at org.snpeff.SnpEff.run(SnpEff.java:1183) at org.snpeff.SnpEff.main(SnpEff.java:162)

moks-micro commented 6 years ago

Hi,

I am having the same issue when using a genbank file. I am using a genbank file downloaded from NCBI (genome accession number is NZ_CP015278). Is there a fix to this ?

I am using snippy 4.0 and snpEff 4.3 version

My snp.log is :

cd /home/sim/Documents/snippy

/home/sim/miniconda2/bin/snippy --outdir test1 --ref DSM44623.gb --R1 NGS1_forward_paired.fq.gz --R2 NGS1_reverse_paired.fq.gz

samtools faidx reference/ref.fa

bwa index reference/ref.fa

[bwa_index] Pack FASTA... 0.00 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.00 seconds elapse. [bwa_index] Update BWT... 0.00 sec [bwa_index] Pack forward-only FASTA... 0.00 sec [bwa_index] Construct SA from BWT and Occ... 0.00 sec [main] Version: 0.7.17-r1188 [main] CMD: bwa index reference/ref.fa [main] Real time: 0.030 sec; CPU: 0.000 sec

mkdir reference/genomes && cp -f reference/ref.fa reference/genomes/ref.fa

mkdir reference/ref && gzip -c reference/ref.gff > reference/ref/genes.gff.gz

snpEff build -c reference/snpeff.config -dataDir . -gff3 ref

java.lang.RuntimeException: Error parsing property 'ref.NZ_CP015278.codonTable'. No such codon table 'Bacterial_and_Plant_Plastid' at org.snpeff.snpEffect.Config.createCodonTables(Config.java:165) at org.snpeff.snpEffect.Config.readConfig(Config.java:626) at org.snpeff.snpEffect.Config.init(Config.java:468) at org.snpeff.snpEffect.Config.(Config.java:114) at org.snpeff.SnpEff.loadConfig(SnpEff.java:292) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:341) at org.snpeff.SnpEff.run(SnpEff.java:1009) at org.snpeff.SnpEff.main(SnpEff.java:155)

cizydorczyk commented 6 years ago

Hi,

I'm also getting the same error as above, with the following reference: NC_002516

snippy --version
snippy 4.0-dev

which snpEff
/home/conrad/anaconda2/bin/snpEff

snpEff -version
4.3

Note that I installed Snippy using anaconda.

The genbank file I am using was last updated Feb 2018, and I have used it in other applications (eg. Prokka) and it has always worked fine.

SNP log:

### cd /home/conrad/Data/all_pseudo

### /home/conrad/anaconda2/bin/snippy --cpus 8 --outdir K6_snippy/6 --ref genome_annotations/K2_pseudomonas_aeruginosa_pao1_apr24_2018.gbk --R1 fastq_files/all_good_fastq/6_1.fq.gz --R2 fastq_files/all_good_fastq/6_2.fq.gz

### samtools faidx reference/ref.fa

### bwa index reference/ref.fa

[bwa_index] Pack FASTA... 0.05 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 2.07 seconds elapse.
[bwa_index] Update BWT... 0.06 sec
[bwa_index] Pack forward-only FASTA... 0.04 sec
[bwa_index] Construct SA from BWT and Occ... 0.62 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index reference/ref.fa
[main] Real time: 3.821 sec; CPU: 2.840 sec

### mkdir reference/genomes && cp -f reference/ref.fa reference/genomes/ref.fa

### mkdir reference/ref && gzip -c reference/ref.gff > reference/ref/genes.gff.gz

### snpEff build -c reference/snpeff.config -dataDir . -gff3 ref

java.lang.RuntimeException: Error parsing property 'ref.NC_002516.codonTable'. No such codon table 'Bacterial_and_Plant_Plastid'
    at org.snpeff.snpEffect.Config.createCodonTables(Config.java:165)
    at org.snpeff.snpEffect.Config.readConfig(Config.java:626)
    at org.snpeff.snpEffect.Config.init(Config.java:468)
    at org.snpeff.snpEffect.Config.<init>(Config.java:114)
    at org.snpeff.SnpEff.loadConfig(SnpEff.java:292)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:341)
    at org.snpeff.SnpEff.run(SnpEff.java:1009)
    at org.snpeff.SnpEff.main(SnpEff.java:155)
stephenturner commented 6 years ago

Found this just now, having the same issue.

### snpEff build -c reference/snpeff.config -dataDir . -gff3 ref

java.lang.RuntimeException: Error parsing property 'ref.NC_002944.codonTable'. No such codon table 'Bacterial_and_Plant_Plastid'
        at org.snpeff.snpEffect.Config.createCodonTables(Config.java:169)
        at org.snpeff.snpEffect.Config.readConfig(Config.java:650)
        at org.snpeff.snpEffect.Config.init(Config.java:480)
        at org.snpeff.snpEffect.Config.<init>(Config.java:117)
        at org.snpeff.SnpEff.loadConfig(SnpEff.java:451)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:364)
        at org.snpeff.SnpEff.run(SnpEff.java:1183)
        at org.snpeff.SnpEff.main(SnpEff.java:162)
$ snippy --version
snippy 4.0-dev
$ snpEff -version
SnpEff  4.3t    2017-11-24
Walwa commented 6 years ago

Same issue on my machine with this set up $ snippy --version snippy 4.0-dev $ which snpEff /conda/envs/genew1/bin/snpEff $ snpEff -version 4.3 I get the same error message I had wondered if it was because I am looking at a Mycoplasma spp. but not it looks like its something in the software.

Slugger70 commented 6 years ago

Hi all,

The snippy 4.0-dev that conda uses as default was accidentally built from a pre-release and so is not ready for use. The fix is to use snippy version 3.2.

If you have snippy 4.0-dev installed from conda, then you need to remove it first and install version 3.2:

conda remove snippy
conda install -c bioconda -c conda-forge snippy=3.2

Hopefully, this will also help with the snpEff errors.

tseemann commented 6 years ago

A proper 4.0 release is coming soon. until now the -dev1 -dev2 etc are for masochists.

(We are using in research, but it appears snpEff is causing issues for many)

stephenturner commented 6 years ago

Rolled back to version 3.2 and I'm still running into errors trying to use the gbk file instead of the fasta.

The snippy logs:

[07:41:52] Extracting FASTA and GFF from reference.
Use of uninitialized value in uc at /home/sdt5z/miniconda3/envs/mavium/bin/snippy line 159, <GEN0> line 101483.
[07:41:54] Wrote 1 sequences to ref.fa
[07:41:54] Wrote 4570 features to ref.gff
[07:41:54] Creating reference/snpeff.config
[07:41:54] Freebayes will process 96 chunks of 1000 bp, 24 chunks at a time.
[07:41:54] Using BAM RG (Read Group) ID: snps
[07:41:54] Running: samtools faidx reference/ref.fa 2>> snps.log
[07:41:54] Running: bwa index reference/ref.fa 2>> snps.log
[07:41:54] Running: mkdir reference/genomes && cp -f reference/ref.fa reference/genomes/ref.fa 2>> snps.log
[07:41:54] Running: mkdir reference/ref && bgzip -c reference/ref.gff > reference/ref/genes.gff.gz 2>> snps.log
[07:41:54] Running: snpEff build -c reference/snpeff.config -dataDir . -gff3 ref 2>> snps.log
[07:41:57] Error running command, check snippy-DW1/snps.log

And the snps.log file

### snpEff build -c reference/snpeff.config -dataDir . -gff3 ref

WARNING: All frames are zero! This seems rather odd, please check that 'frame' information in your 'genes' file is accurate.
java.lang.RuntimeException: Error reading file '/nv/vol184/uvabx/projects/houpt/darwin-mavium/pilot/snippy/snippy-DW1/reference/./ref/genes.gff'
java.lang.RuntimeException: FATAL ERROR: Most Exons do not have sequences!

. File '/path/snippy/snippy-DW1/reference/./ref/genes.gff' line 4573
        'NC_002944      snippy  CDS     4829033 4829176 .       -       0       ID=MAP_RS23120;codon_start=1;db_xref=GeneID:31477900;inference=COORDINATES: similar to AA sequence:RefSeq:WP_005263481.1;locus_tag=MAP_RS23120;note=Derived by automated computational analysis using gene prediction method: Protein Homology.;old_locus_tag=MAP4350c,MAP_4350c;product=50S ribosomal protein L34;protein_id=WP_003874369.1;transl_table=11;translation=MAKGKRTFQPNNRRRARVHGFRLRMRTRAGRAIVSGRRRKGRRALSA'

        at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:353)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
        at org.snpeff.SnpEff.run(SnpEff.java:1183)
        at org.snpeff.SnpEff.main(SnpEff.java:162)
$ snippy --version ; snpEff -version
snippy 3.2-dev
SnpEff  4.3t    2017-11-24

Any ideas?

tseemann commented 6 years ago

@stephenturner snpEff changes its behaviour yet again, or fixed a bug...

what do you have in the main snippy script at this section?

    # it seems to be writeing phase=1 (aka frame) instead of 0 (0-based)
    # i suspect it is using /codon_start= incorrectly (1-based) !!!
    $f->frame(0);

The frame(0) line might be commented out in yours, which was to fix that bug in an older snpEff...

Change it and it should work.

I'm trying to make a release of 4.0 ASAP as this is causing lots of problems.

stephenturner commented 6 years ago

No, the $f->frame(0); is still in play. Here's what that whole block looks like.

while (my $seq = $in->next_seq) {
  exists $refseq{$seq->id} and err("Duplicate sequence ".$seq->id." in $reference");
  $refseq{ $seq->id } = uc($seq->seq); # keep for masking later
  $out->write_seq($seq);
  $nseq++;
  for my $f ($seq->get_SeqFeatures) {
    next if $f->primary_tag =~ m/^(source|misc_feature|gene|RBS)$/;
    $f->source_tag($EXE);
    # it seems to be writeing phase=1 (aka frame) instead of 0 (0-based)
    # i suspect it is using /codon_start= incorrectly (1-based) !!!
    $f->frame(0);
    if ($f->has_tag('locus_tag')) {
      my($id) = $f->get_tag_values('locus_tag');
      $f->add_tag_value('ID', $id);
    }
    if ($f->has_tag('gene')) {
      my($gene) = $f->get_tag_values('gene');
      $f->add_tag_value('Name', $gene);
    }
    $gff->write_feature($f);
    $nfeat++;
  }
}
$ snippy --version
snippy 3.2-dev

(downgraded to 3.2 at @Slugger70's suggestion https://github.com/tseemann/snippy/issues/154#issuecomment-386493122)

tseemann commented 6 years ago

I am planning to replace snpEff with bcftools csq ASAP.

diaz13 commented 6 years ago

hi, I installed snippy today. And i run this command :
snippy --outdir --ref ../../../Parsnp-Linux64-v1.2/FTNF002-00.GBK --ctgs ../../scapper-master/medusa/*.fasta [11:28:37] This is snippy 4.0-pre_20180729 [11:28:37] Written by Torsten Seemann [11:28:37] Obtained from https://github.com/tseemann/snippy [11:28:37] Detected operating system: linux [11:28:37] Enabling bundled linux tools. [11:28:37] Found bwa - /home/user/Tools/snippy-master/snippy/binaries/linux/bwa [11:28:37] Found minimap2 - /home/user/Tools/minimap2-master//minimap2 [11:28:37] Found bcftools - /home/user/Tools/snippy-master/snippy/binaries/linux/bcftools [11:28:37] Found samtools - /home/user/Tools/snippy-master/snippy/binaries/linux/samtools [11:28:37] Found java - /usr/bin/java [11:28:37] Found snpEff - /home/user/Tools/snippy-master/snippy/binaries/noarch/snpEff [11:28:37] Found samclip - /home/user/Tools/snippy-master/snippy/binaries/noarch/samclip [11:28:37] Found seqtk - /home/user/Tools/seqtk-master/seqtk [11:28:37] Found snp-sites - /usr/bin/snp-sites [11:28:37] Found parallel - /home/user/Tools/snippy-master/snippy/binaries/noarch/parallel [11:28:37] Found freebayes - /home/user/Tools/snippy-master/snippy/binaries/linux/freebayes [11:28:37] Found freebayes-parallel - /home/user/Tools/snippy-master/snippy/binaries/noarch/freebayes-parallel [11:28:37] Found fasta_generate_regions.py - /home/user/Tools/snippy-master/snippy/binaries/noarch/fasta_generate_regions.py [11:28:37] Found vcfstreamsort - /home/user/Tools/snippy-master/snippy/binaries/linux/vcfstreamsort [11:28:37] Found vcfuniq - /home/user/Tools/snippy-master/snippy/binaries/linux/vcfuniq [11:28:37] Found vcffirstheader - /home/user/Tools/snippy-master/snippy/binaries/noarch/vcffirstheader [11:28:37] Found gzip - /bin/gzip [11:28:37] Found seqret - /usr/bin/seqret [11:28:37] Found vt - /home/user/Tools/vt-master/vt/vt [11:28:37] Found snippy-vcf_to_tab - /home/user/Tools/snippy-master/snippy/bin/snippy-vcf_to_tab [11:28:37] Found snippy-vcf_report - /home/user/Tools/snippy-master/snippy/bin/snippy-vcf_report [11:28:37] Checking version: samtools --version is >= 1.7 - ok, have 1.9 [11:28:37] Checking version: bcftools --version is >= 1.7 - ok, have 1.9 [11:28:37] Checking version: freebayes --version is >= 1.1 - ok, have 1.1 [11:28:37] Checking version: snpEff -version is >= 4.3 - ok, have 4.3 [11:28:37] Please supply a reference FASTA/GBK/EMBL file with --reference

I don't understand, why he don't found the reference ? I have :+1:

diaz13 commented 6 years ago

hi, I installed snippy today. And i run this command : snipp --outdir results --ref FTNF002-00.GBK --ctgs ../../scapper-master/medusa/*.fasta [11:28:37] This is snippy 4.0-pre_20180729 [11:28:37] Written by Torsten Seemann [11:28:37] Obtained from https://github.com/tseemann/snippy [11:28:37] Detected operating system: linux [11:28:37] Enabling bundled linux tools. [11:28:37] Found bwa - /home/user/Tools/snippy-master/snippy/binaries/linux/bwa [11:28:37] Found minimap2 - /home/user/Tools/minimap2-master//minimap2 [11:28:37] Found bcftools - /home/user/Tools/snippy-master/snippy/binaries/linux/bcftools [11:28:37] Found samtools - /home/user/Tools/snippy-master/snippy/binaries/linux/samtools [11:28:37] Found java - /usr/bin/java [11:28:37] Found snpEff - /home/user/Tools/snippy-master/snippy/binaries/noarch/snpEff [11:28:37] Found samclip - /home/user/Tools/snippy-master/snippy/binaries/noarch/samclip [11:28:37] Found seqtk - /home/user/Tools/seqtk-master/seqtk [11:28:37] Found snp-sites - /usr/bin/snp-sites [11:28:37] Found parallel - /home/user/Tools/snippy-master/snippy/binaries/noarch/parallel [11:28:37] Found freebayes - /home/user/Tools/snippy-master/snippy/binaries/linux/freebayes [11:28:37] Found freebayes-parallel - /home/user/Tools/snippy-master/snippy/binaries/noarch/freebayes-parallel [11:28:37] Found fasta_generate_regions.py - /home/user/Tools/snippy-master/snippy/binaries/noarch/fasta_generate_regions.py [11:28:37] Found vcfstreamsort - /home/user/Tools/snippy-master/snippy/binaries/linux/vcfstreamsort [11:28:37] Found vcfuniq - /home/user/Tools/snippy-master/snippy/binaries/linux/vcfuniq [11:28:37] Found vcffirstheader - /home/user/Tools/snippy-master/snippy/binaries/noarch/vcffirstheader [11:28:37] Found gzip - /bin/gzip [11:28:37] Found seqret - /usr/bin/seqret [11:28:37] Found vt - /home/user/Tools/vt-master/vt/vt [11:28:37] Found snippy-vcf_to_tab - /home/user/Tools/snippy-master/snippy/bin/snippy-vcf_to_tab [11:28:37] Found snippy-vcf_report - /home/user/Tools/snippy-master/snippy/bin/snippy-vcf_report [11:28:37] Checking version: samtools --version is >= 1.7 - ok, have 1.9 [11:28:37] Checking version: bcftools --version is >= 1.7 - ok, have 1.9 [11:28:37] Checking version: freebayes --version is >= 1.1 - ok, have 1.1 [11:28:37] Checking version: snpEff -version is >= 4.3 - ok, have 4.3 [11:28:37] Please supply a reference FASTA/GBK/EMBL file with --reference

tseemann commented 6 years ago

@diaz13 rename your file to use a lowercase .gbk instead of .GBK Bioperl might not understand .GBK

diaz13 commented 6 years ago

He works but sometimes he puts me the same error message. But I will see that. Thank you very much @tseemann

AlcaArctica commented 6 years ago

I just encountered the same isssue. Same problem with either a .gkb file from NCBI or one obtained with RAST. I am using snpEff -version 4.3 and snippy --version snippy 3.2-dev. Thanks for any advice

tseemann commented 4 years ago

@AlcaArctica if you install snippy 4.x in a brand new conda environment it should work