nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

SNAP prediction failed - Funannotate test #571

Closed Oixil closed 3 years ago

Oixil commented 3 years ago

Are you using the latest release? funannotate v1.8.3

Describe the bug 0 prediction from SNAP SNAP prediction failed, moving on without result

What command did you issue? funannotate test -t busco

Logfiles Extract of funannotate-predict.log

[03/11/21 12:13:52]: /home/$HOME/miniconda3/envs/annotation/bin/funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 2 --species Awesome busco [03/11/21 12:13:52]: OS: linux2, 8 cores, ~ 16 GB RAM. Python: 2.7.15 [03/11/21 12:13:52]: Running funannotate v1.8.3 [03/11/21 12:13:52]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initioprediction. [03/11/21 12:13:54]: {u'genemark': 0, u'hiq': 2, u'glimmerhmm': 1, u'pasa': 6, u'snap': 1, u'transcripts': 1, u'proteins': 1, u'codingquarry': 0, u'augustus': 1} [03/11/21 12:13:54]: Skipping CodingQuarry as no --rna_bam passed [03/11/21 12:13:54]: {u'snap': u'busco', u'glimmerhmm': u'busco', u'augustus': u'busco'} [03/11/21 12:13:54]: Parsed training data, run ab-initio gene predictors as follows: [03/11/21 12:13:55]: {u'genemark': 0, u'hiq': 2, u'glimmerhmm': 1, u'pasa': 6, u'snap': 1, u'transcripts': 1, u'proteins': 1, u'codingquarry': 0, u'augustus': 1} [03/11/21 12:13:55]: Loading genome assembly and parsing soft-masked repetitive sequences [03/11/21 12:13:56]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked

03/11/21 12:15:43: Running BUSCO to find conserved gene models for training ab-initio predictors 03/11/21 12:15:43: /home/$HOME/miniconda3/envs/annotation/bin/python /home/$HOME/miniconda3/envs/annotation/lib/python2.7/site-packages/funannotate/aux_scripts/funannotate-BUSCO2-py2.py -i /home/$HOME/test-busco_764/annotate/predict_misc/genome.softmasked.fa -m genome --lineage /home/$HOME/funannotate_db/dikarya -o awesome_busco -c 2 --species anidulans -f --local_augustus /home/$HOME/test-busco_764/annotate/predict_misc/ab_initio_parameters/augustus [03/11/21 12:34:37]: 373 valid BUSCO predictions found, validating protein sequences [03/11/21 12:35:33]: 370 BUSCO predictions validated [03/11/21 12:35:33]: Training Augustus using BUSCO gene models [03/11/21 12:35:33]: gff2gbSmallDNA.pl annotate/predict_misc/busco.final.gff3 /home/$HOME/test-busco_764/annotate/predict_misc/genome.softmasked.fa 600 annotate/predict_misc/augustus.training.busco.gb [03/11/21 12:35:44]: Augustus initial training results: [03/11/21 12:35:44]: Running Augustus gene prediction using awesome_busco parameters [03/11/21 12:37:40]: perl /home/$HOME/miniconda3/envs/annotation/opt/evidencemodeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl annotate/predict_misc/augustus.gff3 [03/11/21 12:37:40]: Pulling out high quality Augustus predictions [03/11/21 12:37:40]: Found 319 high quality predictions from Augustus (>90% exon evidence) [03/11/21 12:37:40]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [03/11/21 12:37:41]: 370 gene models to train snap on 6 scaffolds [03/11/21 12:37:41]: fathom /home/$HOME/test-busco_764/annotate/predict_misc/snap.training.zff /home/$HOME/test-busco_764/annotate/predict_misc/snap-training.scaffolds.fasta -categorize 1000 -min-intron 10 -max-intron 3000 [03/11/21 12:37:41]: fathom uni.ann uni.dna -export 1000 -plus [03/11/21 12:37:42]: forge export.ann export.dna [03/11/21 12:37:44]: perl /home/$HOME/miniconda3/envs/annotation/bin/hmm-assembler.pl snap-trained annotate/predict_misc/snaptrain [03/11/21 12:37:44]: snap /home/$HOME/test-busco_764/annotate/predict_misc/snap-trained.hmm /home/$HOME/test-busco_764/annotate/predict_misc/genome.softmasked.fa [03/11/21 12:37:58]: 0 predictions from SNAP [03/11/21 12:37:58]: SNAP prediction failed, moving on without result [03/11/21 12:37:58]: snap failed removing from training parameters [03/11/21 12:37:58]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [03/11/21 12:37:59]: trainGlimmerHMM /home/$HOME/test-busco_764/annotate/predict_misc/genome.softmasked.fa /home/$HOME/test-busco_764/annotate/predict_misc/glimmer.exons -d annotate/predict_misc/glimmerhmm

Test output:

funannotate test -t busco ######################################################### Running funannotate predict BUSCO-mediated training unit testing CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 2 --species Awesome busco #########################################################

[12:13 PM]: OS: linux2, 8 cores, ~ 16 GB RAM. Python: 2.7.15 [12:13 PM]: Running funannotate v1.8.3 [12:13 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [12:13 PM]: Skipping CodingQuarry as no --rna_bam passed [12:13 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco glimmerhmm busco snap busco [12:13 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [12:13 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [12:13 PM]: Mapping 1,065 proteins to genome using diamond and exonerate [12:14 PM]: Found 1,784 preliminary alignments --> aligning with exonerate [12:15 PM]: Exonerate finished: found 1,435 alignments [12:15 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [12:34 PM]: 373 valid BUSCO predictions found, validating protein sequences [12:35 PM]: 370 BUSCO predictions validated [12:35 PM]: Training Augustus using BUSCO gene models [12:35 PM]: Augustus initial training results: Feature Specificity Sensitivity nucleotides 99.7% 85.3% exons 76.0% 60.1% genes 88.0% 60.3% [12:35 PM]: Running Augustus gene prediction using awesome_busco parameters [12:37 PM]: 1,399 predictions from Augustus [12:37 PM]: Pulling out high quality Augustus predictions [12:37 PM]: Found 319 high quality predictions from Augustus (>90% exon evidence) [12:37 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [12:37 PM]: 0 predictions from SNAP [12:37 PM]: SNAP prediction failed, moving on without result [12:37 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [12:42 PM]: 1,773 predictions from GlimmerHMM [12:42 PM]: Summary of gene models passed to EVM (weights): Source Weight Count Augustus 1 1080 Augustus HiQ 2 319 GlimmerHMM 1 1773 Total - 3172 [12:42 PM]: EVM: partitioning input to ~ 35 genes per partition [12:50 PM]: Converting to GFF3 and collecting all EVM results [12:50 PM]: 1,710 total gene models from EVM [12:50 PM]: Generating protein fasta files from 1,710 EVM models [12:50 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [12:50 PM]: Found 159 gene models to remove: 0 too short; 0 span gaps; 159 transposable elements [12:50 PM]: 1,551 gene models remaining [12:50 PM]: Predicting tRNAs [12:50 PM]: 104 tRNAscan models are valid (non-overlapping) [12:50 PM]: Generating GenBank tbl annotation file [12:50 PM]: Converting to final Genbank format [12:50 PM]: Collecting final annotation files for 1,655 total gene models [12:50 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder [12:50 PM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (Docker required): funannotate iprscan -i annotate -m docker -c 2

Run antiSMASH: funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: funannotate annotate -i annotate --cpus 2 --sbt yourSBTfile.txt

[12:50 PM]: Training parameters file saved: annotate/predict_results/awesome_busco.parameters.json [12:50 PM]: Add species parameters to database:

funannotate species -s awesome_busco -a annotate/predict_results/awesome_busco.parameters.json

######################################################### SUCCESS: funannotate predict BUSCO-mediated training test complete. ######################################################### Now running predict using all pre-trained ab-initio predictors CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate2 --cpus 2 --species Awesome busco -p annotate/predict_results/awesome_busco.parameters.json #########################################################

[12:50 PM]: OS: linux2, 8 cores, ~ 16 GB RAM. Python: 2.7.15 [12:50 PM]: Running funannotate v1.8.3 [12:50 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [12:51 PM]: Ab initio training parameters file passed: annotate/predict_results/awesome_busco.parameters.json [12:51 PM]: Skipping CodingQuarry as no --rna_bam passed [12:51 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained glimmerhmm pretrained snap busco [12:51 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [12:51 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [12:51 PM]: Mapping 1,065 proteins to genome using diamond and exonerate [12:51 PM]: Found 1,784 preliminary alignments --> aligning with exonerate [12:52 PM]: Exonerate finished: found 1,435 alignments [12:52 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [01:10 PM]: 373 valid BUSCO predictions found, validating protein sequences [01:11 PM]: 370 BUSCO predictions validated [01:11 PM]: Running Augustus gene prediction using awesome_busco parameters [01:13 PM]: 1,399 predictions from Augustus [01:13 PM]: Pulling out high quality Augustus predictions [01:13 PM]: Found 319 high quality predictions from Augustus (>90% exon evidence) [01:13 PM]: Running SNAP gene prediction, using training data: annotate2/predict_misc/busco.final.gff3 [01:13 PM]: 2 predictions from SNAP [01:13 PM]: Running GlimmerHMM gene prediction, using pretrained HMM profile [01:14 PM]: 1,773 predictions from GlimmerHMM [01:14 PM]: Summary of gene models passed to EVM (weights): Source Weight Count Augustus 1 1080 Augustus HiQ 2 319 GlimmerHMM 1 1773 snap 1 2 Total - 3174 [01:14 PM]: EVM: partitioning input to ~ 35 genes per partition [01:22 PM]: Converting to GFF3 and collecting all EVM results [01:22 PM]: 1,562 total gene models from EVM [01:22 PM]: Generating protein fasta files from 1,562 EVM models [01:22 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [01:22 PM]: Found 99 gene models to remove: 0 too short; 0 span gaps; 99 transposable elements [01:22 PM]: 1,463 gene models remaining [01:22 PM]: Predicting tRNAs [01:22 PM]: 112 tRNAscan models are valid (non-overlapping) [01:22 PM]: Generating GenBank tbl annotation file [01:23 PM]: Converting to final Genbank format [01:23 PM]: Collecting final annotation files for 1,575 total gene models [01:23 PM]: Funannotate predict is finished, output files are in the annotate2/predict_results folder [01:23 PM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (Docker required): funannotate iprscan -i annotate2 -m docker -c 2

Run antiSMASH: funannotate remote -i annotate2 -m antismash -e youremail@server.edu

Annotate Genome: funannotate annotate -i annotate2 --cpus 2 --sbt yourSBTfile.txt

[01:23 PM]: Training parameters file saved: annotate2/predict_results/awesome_busco.parameters.json [01:23 PM]: Add species parameters to database:

funannotate species -s awesome_busco -a annotate2/predict_results/awesome_busco.parameters.json

######################################################### SUCCESS: funannotate predict using existing parameters test complete. #########################################################

OS/Install Information


Checking dependencies for 1.8.3

You are running Python v 2.7.15. Now checking python packages... biopython: 1.76 goatools: 1.0.15 matplotlib: 2.2.5 natsort: 6.2.0 numpy: 1.16.5 pandas: 0.24.2 psutil: 5.7.0 requests: 2.13.0 scikit-learn: 0.20.3 scipy: 1.2.1 seaborn: 0.9.0 All 11 python packages installed

You are running Perl v 5.026002. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/home/$HOME/funannotate_db $PASAHOME=/home/$HOME/miniconda3/envs/annotation/opt/pasa-2.4.1 $TRINITY_HOME=/home/$HOME/miniconda3/envs/annotation/opt/trinity-2.8.5 $EVM_HOME=/home/$HOME/miniconda3/envs/annotation/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/$HOME/miniconda3/envs/annotation/config/ ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.7 emapper.py: diamond /home/$HOME/miniconda3/envs/annotation/lib/python2.7/site-packages emapper-2.0.1

ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 10.0.2 kallisto: 0.46.1 mafft: v7.475 (2020/Nov/23) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.17-r941 proteinortho: 6.0.28 pslCDnaFilter: no way to determine salmon: salmon 0.15.0 samtools: samtools 1.10 snap: 2006-07-28 stringtie: 2.1.4 tRNAscan-SE: 2.0.7 (Oct 2020) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: gmes_petap.pl not installed ERROR: hisat2 not installed ERROR: signalp not installed

nextgenusfs commented 3 years ago

I think the Bioconda snap is broken on one of the Linux distributions -- Debian family I think. Fix is to install snap manually or I think its also available in apt-get.

HenrivdGeest commented 3 years ago

Related to this, for me in some cases GlimmerHMM and SNAP predictions are failing. Is there any way to find which command is exactly executed or if there is any log on the failures? I tried digging thru all the logs, but I cannot find any more information on why these steps are failing. I only see this:

[03/15/21 16:00:18]: Snap training failed, empty training set: predict_part8/predict_misc/final_training_models.gff3
[03/15/21 16:00:18]: snap failed removing from training parameters
[03/15/21 16:00:18]: GlimmerHMM training failed, empty training set: predict_part8/predict_misc/final_training_models.gff3
[03/15/21 16:00:18]: GlimmerHMM failed, removing from training parameters
nextgenusfs commented 3 years ago

@HenrivdGeest this seems like a separate issue - it says your training data is empty so both trainings failed. Training data comes from PASA if using the train module or uses BUSCO if there is no PASA data.

kmkocot commented 3 years ago

I've been having similar snap issues. Would it be possible to include a flag that specifies an external snap gff in the next release?

nextgenusfs commented 3 years ago

@kmkocot meaning snap from bioconda is failing due to compilation issue or you have empty training set data?

kmkocot commented 3 years ago

Hi Jon,

I'm not exactly sure where the problem is with snap The snap.training.zff and snap-training.scaffolds.fasta files look fine. The contents of snap-trained.hmm file are formatted normally, but the values look fishy. Here's the errors I get:

`[04/18/21 01:36:47]: Running SNAP gene prediction, using training data: funannotate_results/predict_misc/final_training_models.gff3 [04/18/21 01:37:06]: 230 gene models to train snap on 228 scaffolds [04/18/21 01:37:23]: fathom /home/wirenia/Desktop/2021-04-15_Hanleya_funannotate/funannotate_results/predict_misc/snap.training.zff /home/wirenia/Desktop/2021-04-15_Hanleya_funannotate/funannotate_results/predict_misc/snap-training.scaffolds.fasta -categorize 1000 -min-intron 10 -max-intron 10000 [04/18/21 01:37:25]: g_3-T1 1 1 10 + errors(10): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds g_4-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_6-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_5-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_7-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_8-T1 1 1 8 + errors(8): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds g_10-T1 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds g_13-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_15-T1 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds g_17-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_20-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_21-T1 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds g_22-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_27-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_30-T1 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds g_31-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_35-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_37-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_41-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_43-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_44-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_45-T1 1 1 10 + errors(2): exon-9:out_of_bounds exon-10:out_of_bounds g_47-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_52-T1 1 1 7 + errors(7): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds g_55-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_57-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_58-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_64-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_66-T1 1 1 12 + errors(12): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds g_67-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_68-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_72-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_80-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_81-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_84-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_97-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_128-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_150-T1 1 1 2 + errors(1): exon-2:out_of_bounds g_155-T1 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds g_157-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_161-T1 1 1 13 + errors(12): exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds g_163-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_170-T1 1 1 20 + errors(20): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds g_173-T1 1 1 11 + errors(11): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds g_174-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_176-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_188-T1 1 1 7 + errors(7): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds g_192-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_193-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_197-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_200-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_201-T1 1 1 2 + errors(1): exon-2:out_of_bounds g_202-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_209-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_210-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_211-T1 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds g_212-T1 1 1 2 + errors(1): exon-2:out_of_bounds g_215-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_216-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_222-T1 1 1 4 + errors(4): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds g_224-T1 1 1 3 + errors(3): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds g_226-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds g_227-T1 1 1 2 + errors(2): exon-1:out_of_bounds exon-2:out_of_bounds

[04/18/21 01:37:25]: fathom uni.ann uni.dna -export 1000 -plus [04/18/21 01:37:25]: forge export.ann export.dna [04/18/21 01:37:25]: perl /usr/bin/hmm-assembler.pl snap-trained funannotate_results/predict_misc/snaptrain [04/18/21 01:37:25]: snap /home/wirenia/Desktop/2021-04-15_Hanleya_funannotate/funannotate_results/predict_misc/snap-trained.hmm /home/wirenia/Desktop/2021-04-15_Hanleya_funannotate/funannotate_results/predict_misc/genome.softmasked.fa [04/18/21 05:38:53]: 0 predictions from SNAP [04/18/21 05:38:53]: SNAP prediction failed, moving on without result [04/18/21 05:38:53]: snap failed removing from training parameters `

Let me know if you'd like to check out any of the input/output files.

Thanks! Kevin

nextgenusfs commented 3 years ago

If you are on Ubuntu/Debian uninstall snap from bioconda and install using apt-get and see if you get same results.

kmkocot commented 3 years ago

My hero! That fixed it! Sorry I neglected to provide the relevant details above, but you guessed correctly. Thank you very much.

spock commented 3 years ago

It looks like there is just a single version of SNAP on bioconda (version 2013_11_29), re-packaged/re-uploaded every now and then.

I wonder if non-zero but still very low number of predictions is also indicative of a SNAP prediction problem?
I'm getting 16 predictions from 1228 gene models used for training.
Is there an easy way to check?

[05/10/21 19:06:46]: Running SNAP gene prediction, using training data: /my_sp/predict_misc/busco.final.gff3
[05/10/21 19:06:47]: 1228 gene models to train snap on 23 scaffolds
[05/10/21 19:06:47]: fathom /my_sp/predict_misc/snap.training.zff my_sp/predict_misc/snap-training.scaffolds.fasta -categorize 1000 -min-intron 10 -max-intron 3000
[05/10/21 19:06:48]: fathom uni.ann uni.dna -export 1000 -plus
[05/10/21 19:06:49]: forge export.ann export.dna
[05/10/21 19:06:50]: perl ~/.conda/envs/funannotate/bin/hmm-assembler.pl snap-trained my_sp/predict_misc/snaptrain
[05/10/21 19:06:50]: snap /my_sp/predict_misc/snap-trained.hmm /my_sp/predict_misc/genome.softmasked.fa
[05/10/21 19:08:23]: 16 predictions from SNAP

funannotate test (when running with BUSCO-mediated training) succeeds but does have these lines in the output:

[May 25 08:58 AM]: Found 314 high quality predictions from Augustus (>90% exon evidence)
[May 25 08:58 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[May 25 08:59 AM]: 0 predictions from SNAP
[May 25 08:59 AM]: SNAP prediction failed, moving on without result
...
SUCCESS: `funannotate predict` BUSCO-mediated training test complete.
...
SUCCESS: `funannotate predict` using existing parameters test complete.
aberaslop commented 3 years ago

Hi Jon,

I have the same problem as @kmkocot, snap failed giving 0 models. I am also on a Debian platform, so have tried to uninstall snap and install it again manually, as per your suggestion (could not use apt-get because I am not root). While funannotate recognizes this new snap, fathom fails:

fathom /home/aberas2/datasets/Genome/pacbio/funannotate_50/funannotate_train/predict_misc/snap.training.zff /home/aberas2/datasets/Genome/pacbio/funannotate_50/funannotate_train/predict_misc/snap-training.scaffolds.fasta -categorize 1000 -min-intron 10 -max-intron 3000

/home/aberas2/miniconda3/envs/funannotate/bin/fathom: line 7: /home/aberas2/miniconda3/envs/funannotate/share/snap/bin/fathom: No such file or directory

The new snap installation is in /home/aberas2/miniconda3/envs/funannotate/share/SNAP/fathom, so it seems that funannotate is looking for the program in the wrong directory. To solve that, I tried editing the fathom file in funannotate/bin to give the new path. But it still errors. I then copied the new fathom binary to funannotate/bin (as suggested here) and that seems to solve the problem. Then hmm-assembler.pl failed with the same problem and I solved it the same way. Will these changes ruin funannotate's installation or other steps in the pipeline in the future?

Thanks!

L.