tseemann / snippy

:scissors: :zap: Rapid haploid variant calling and core genome alignment
GNU General Public License v2.0
469 stars 115 forks source link

Snippy fails with genbank file #581

Open mesti90 opened 8 months ago

mesti90 commented 8 months ago

I tried to run snippy with the following genbank file: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/008/632/635/GCF_008632635.1_ASM863263v1/GCF_008632635.1_ASM863263v1_genomic.gbff.gz However, the pipeline failed for me when snpEff started. My snpEff version is SnpEff 5.2a (build 2023-10-24 14:24)

The I command called was: snippy --cpus 60 --outdir outdir --ref GCF_008632635.1_ASM863263v1_genomic.gbff --R1 forward.fastq.gz --R2 reverse.fastq.gz

I also tried replacing gbff with gbk, but unsuccessfully. I'd appreciate your help

mjm269 commented 8 months ago

I was able to get the pipeline to work with simply changing the file extension on the file itself from .gbff to .gbk. Not sure why you are having a problem. It might be because when you download the file from NCBI or where ever you got the sequence the file name you have in your code above is for the fasta file and not the genbank file. for me the genbank file was simply genomic.gbff when I downloaded the file from NCBI.

jubelik commented 7 months ago

I have the same problem. Snippy works with a .fa file, but not with the .gbff, even when I change the extention to .gbk. The snps.log files give me that information :

snpEff build -c reference/snpeff.config -dataDir . -gff3 ref

WARNING: All frames are zero! This seems rather odd, please check that 'frame' information in your 'genes' file is accurate. Exon.frameCorrection(141): Exon too short (size: 1), cannot correct frame! NC_051439:9919683-9919683 'EXON_NC_051439_9919683_9919684', rank: 3, frame: 2, sequence: a CDS too short, cannot correct frame: frame size 1, frame correction 1, CDS: NC_051439 9919683-9919683 CDS 'CDS_17781', frame: 2 Exon.frameCorrection(141): Exon too short (size: 1), cannot correct frame!

[...]

freebayes-parallel reference/ref.txt 16 -p 2 -P 0 -C 2 -F 0.05 --min-coverage 5 --min-repeat-entropy 1.0 -q 13 -m 60 --strict-vcf -f ref>

Error: Failed to set region: -1:0-22,323,906() ERROR(freebayes): Could not SetRegion to NC_051430:0..22323906 Error: Failed to set region: -1:22,323,906-36,055,409() ERROR(freebayes): Could not SetRegion to NC_051430:22323906..36055409

Any ideas? Thanks!

NiuNiuguohao commented 4 months ago

I met the same error. This is my command: snippy --outdir 'S12019' --R1 'S12019.1.fastq.gz' --R2 'S12019.2.fastq.gz' --reference S1256.gb --cpus 8 S1256.gb is a gbff file from genebank. I found that when I changed the reference file, it works. Change ‘.gb’ to '.gbk' doesn't work.

sophiehoyer commented 3 months ago

FYI, I was able to run snippy today using this .gbff file from this Enterococcus faecalis assembly.

I renamed the downloaded .gbff file and was able to run snippy with the command:

snippy --cpus 16 --outdir outdir --ref Efs152.gbff --R1 forward.fq.gz --R2 reverse.fq.gz

Maybe try running the command again?