nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

funannotate test failing at predict step #1047

Open pwkooij opened 2 weeks ago

pwkooij commented 2 weeks ago

Are you using the latest release? funannotate v1.8.17

Describe the bug I'm trying to run the test after installation using funannotate test -t all --cpus 10 but it crashes in the funannotate predict step

What command did you issue? unannotate test -t all --cpus 10

Logfiles Copying below the output with just the first error which then seems to be repeated 45x, but adding full log file in case necessary.

funannotate test -t all --cpus 10
#########################################################
Running `funannotate clean` unit testing: minimap2 mediated assembly duplications
Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive
#########################################################
minimap2 version=2.28-r1209 path=/home/pepijn/miniconda3/envs/funannotate/bin/minimap2
-----------------------------------------------
6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp
Checking duplication of 6 contigs
-----------------------------------------------
scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153
scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858
scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039
-----------------------------------------------
6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file
#########################################################
SUCCESS: `funannotate clean` test complete.
#########################################################

#########################################################
Running `funannotate mask` unit testing: RepeatModeler --> RepeatMasker
Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 10
#########################################################
-------------------------------------------------------
[Jun 10 02:21 PM]: OS: Ubuntu 22.04, 20 cores, ~ 132 GB RAM. Python: 3.9.19
[Jun 10 02:21 PM]: Running funanotate v1.8.17
[Jun 10 02:21 PM]: Soft-masking simple repeats with tantan
[Jun 10 02:21 PM]: Repeat soft-masking finished: 
Masked genome: /home/pepijn/miniconda3/envs/funannotate/bin/test-mask_6d6ef314-2287-415b-905a-b2ccdf6755b2/test.masked.fa
num scaffolds: 2
assembly size: 1,216,048 bp
masked repeats: 50,965 bp (4.19%)
-------------------------------------------------------
#########################################################
SUCCESS: `funannotate mask` test complete.
#########################################################

#########################################################
Running `funannotate predict` unit testing
Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 10 --species Awesome testicus
#########################################################
-------------------------------------------------------
[Jun 10 02:21 PM]: OS: Ubuntu 22.04, 20 cores, ~ 132 GB RAM. Python: 3.9.19
[Jun 10 02:21 PM]: Running funannotate v1.8.17
[Jun 10 02:21 PM]: Skipping CodingQuarry as no --rna_bam passed
[Jun 10 02:21 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  genemark     selftraining   
  glimmerhmm   busco          
  snap         busco          
[Jun 10 02:21 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Jun 10 02:21 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
/home/pepijn/miniconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/aux_scripts/funannotate-p2g.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import parse_version
[Jun 10 02:21 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Jun 10 02:21 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
     Progress: 1505 complete, 0 failed, 0 remaining          
[Jun 10 02:21 PM]: Exonerate finished in 0:00:21: found 1,272 alignments
[Jun 10 02:21 PM]: Running GeneMark-ES on assembly
[Jun 10 02:24 PM]: 1,569 predictions from GeneMark
[Jun 10 02:24 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Jun 10 02:30 PM]: 370 valid BUSCO predictions found, validating protein sequences
[Jun 10 02:30 PM]: 192 BUSCO predictions validated
[Jun 10 02:30 PM]: Running Augustus gene prediction using saccharomyces parameters
     Progress: 11 complete, 0 failed, 0 remaining        
[Jun 10 02:32 PM]: 1,485 predictions from Augustus
[Jun 10 02:32 PM]: Pulling out high quality Augustus predictions
[Jun 10 02:32 PM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Jun 10 02:32 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Jun 10 02:33 PM]: 0 predictions from SNAP
[Jun 10 02:33 PM]: SNAP prediction failed, moving on without result
[Jun 10 02:33 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Jun 10 02:33 PM]: 169 predictions from GlimmerHMM
[Jun 10 02:33 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GeneMark       1        1569 
  GlimmerHMM     1        169  
  Total          -        3435 
[Jun 10 02:33 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
error: [Errno 2] No such file or directory: '/home/pepijn/miniconda3/envs/funannotate/opt/evidencemodeler-2.1.0/evidence_modeler.pl' run(*(['/home/pepijn/miniconda3/envs/funannotate/opt/evidencemodeler-2.1.0/evidence_modeler.pl', '-G', '/home/pepijn/miniconda3/envs/funannotate/bin/test-predict_6d6ef314-2287-415b-905a-b2ccdf6755b2/annotate/predict_misc/EVM/CP022972.1/CP022972.1_214443-281740/genome.softmasked.fa', '-g', '/home/pepijn/miniconda3/envs/funannotate/bin/test-predict_6d6ef314-2287-415b-905a-b2ccdf6755b2/annotate/predict_misc/EVM/CP022972.1/CP022972.1_214443-281740/gene_predictions.gff3', '-w', '/home/pepijn/miniconda3/envs/funannotate/bin/test-predict_6d6ef314-2287-415b-905a-b2ccdf6755b2/annotate/predict_misc/weights.evm.txt', '--min_intron_length', '10', '--exec_dir', '/home/pepijn/miniconda3/envs/funannotate/bin/test-predict_6d6ef314-2287-415b-905a-b2ccdf6755b2/annotate/predict_misc/EVM/CP022972.1/CP022972.1_214443-281740', '-p', '/home/pepijn/miniconda3/envs/funannotate/bin/test-predict_6d6ef314-2287-415b-905a-b2ccdf6755b2/annotate/predict_misc/EVM/CP022972.1/CP022972.1_214443-281740/protein_alignments.gff3', '/home/pepijn/miniconda3/envs/funannotate/bin/test-predict_6d6ef314-2287-415b-905a-b2ccdf6755b2/annotate/predict_misc/EVM/CP022972.1/CP022972.1_214443-281740/evm.out', '/home/pepijn/miniconda3/envs/funannotate/bin/test-predict_6d6ef314-2287-415b-905a-b2ccdf6755b2/annotate/predict_misc/EVM/CP022972.1/CP022972.1_214443-281740/evm.out.log'],), **{})
     Progress: 45 complete, 0 failed, 0 remaining        
[Jun 10 02:33 PM]: Converting to GFF3 and collecting all EVM results
[Jun 10 02:33 PM]: Evidence modeler has failed, exiting
#########################################################
Traceback (most recent call last):
  File "/home/pepijn/miniconda3/envs/funannotate/bin/funannotate", line 10, in <module>
    sys.exit(main())
  File "/home/pepijn/miniconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/funannotate.py", line 717, in main
    mod.main(arguments)
  File "/home/pepijn/miniconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 405, in main
    runPredictTest(args)
  File "/home/pepijn/miniconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 160, in runPredictTest
    assert 1500 <= countGFFgenes(os.path.join(
  File "/home/pepijn/miniconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 45, in countGFFgenes
    with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_6d6ef314-2287-415b-905a-b2ccdf6755b2/annotate/predict_results/Awesome_testicus.gff3'

funannotate_test.log

OS/Install Information

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.858 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.17 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/home/pepijn/funannotate_db $PASAHOME=/home/pepijn/miniconda3/envs/funannotate/opt/pasa-2.5.3 $TRINITY_HOME=/home/pepijn/miniconda3/envs/funannotate/opt/trinity-2.15.1 $EVM_HOME=/home/pepijn/miniconda3/envs/funannotate/opt/evidencemodeler-2.1.0 $AUGUSTUS_CONFIG_PATH=/home/pepijn/miniconda3/envs/funannotate/config/ $GENEMARK_PATH=/usr/local/bioinf/gmes_linux_64_4 All 6 environmental variables are set

Checking external dependencies... PASA: 2.5.3 CodingQuarry: 2.0 Trinity: 2.15.1 augustus: 3.5.0 bamtools: bamtools 2.5.2 bedtools: bedtools v2.31.1 blat: BLAT v37x1 diamond: 2.1.9 emapper.py: 2.1.12 ete3: 3.1.3 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2021-12-17 gmes_petap.pl: 4.71_lic hisat2: 2.2.1 hmmscan: HMMER 3.4 (Aug 2023) hmmsearch: HMMER 3.4 (Aug 2023) java: 22.0.1-internal kallisto: 0.46.1 mafft: v7.526 (2024/Apr/26) makeblastdb: makeblastdb 2.15.0+ minimap2: 2.28-r1209 pigz: 2.4 proteinortho: 6.3.1 pslCDnaFilter: no way to determine salmon: salmon 1.10.3 samtools: samtools 1.18 signalp: 6.0 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.12 (Nov 2022) tantan: tantan 49 tbl2asn: 25.8 tblastn: tblastn 2.15.0+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 37 external dependencies are installed

pwkooij commented 2 weeks ago

I've had a quick look at the error message error: [Errno 2] No such file or directory: '/home/pepijn/miniconda3/envs/funannotate/opt/evidencemodeler-2.1.0/evidence_modeler.pl' and indeed the perl script is not there, however, it can be found in the following folder: /home/pepijn/miniconda3/envs/funannotate/opt/evidencemodeler-2.1.0/EvmUtils

So it seems to me a simple misdirection to the correct folder. Not sure yet how to correct this but will have a further look

jallen73 commented 2 weeks ago

Hello, I am having the same challenge. Have you found a solution?

jallen73 commented 2 weeks ago

Ok, I went ahead and created a softlink and that solved this problem. Here's the command I used:

ln -s /home/jessica/tools/miniconda3/envs/fun1817/opt/evidencemodeler-2.1.0/EvmUtils/evidence_modeler.pl /home/jessica/tools/miniconda3/envs/fun1817/opt/evidencemodeler-2.1.0/evidence_modeler.pl

pwkooij commented 2 weeks ago

Hello, I am having the same challenge. Have you found a solution?

I copied all the .pl files in the lower level folder to make it work. But I feel this needs an adjustment in the code of funannotate. Haven't figured out where though...

nextgenusfs commented 2 weeks ago

Was originally written for EVM v1.1 -- EVMv2 re-organized the code -- I thought I had it working but maybe not. I don't have the time/bandwidth to keep up with all of the dependencies and their constant change. Hence, I started funannotate2 to remove as many dependencies as possible. There shouldn't be a performance different between EVM v2 and v1 -- so can safely just downgrade EVM and should work.

pwkooij commented 2 weeks ago

Thanks Jon @nextgenusfs ! For now I fixed by copying the perl scripts, and that seems to work, but good information for anyone else. Any chance you can add a quick not on that in the readthedocs?

And great to hear you're working on a v2, if you need any help testing, I seem to work with weird fungal genomes ;) (are you going to IMC?)