nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

Another test data failure after fresh conda install #1073

Open amcomeau opened 2 weeks ago

amcomeau commented 2 weeks ago

I have another similar error to what is showing on a lot of posts, but this is with the most current version install with a fresh environment, so I'm not sure why all dependencies are not included in the conda. I have had to download a few things so far to get other sections working, but now there seems to be a problem with AUGUSTUS even though it is specifically installed as part of the conda.

Are you using the latest release? Yes, directly installed from conda.

Describe the bug Test run fails with AUGUSTUS intron errors.

What command did you issue? funannotate test -t all --cpus 40

Logfiles The Clean, Mask and Predict (unit testing) modules all complete...then we hit the error at the BUSCO training module:

Running funannotate predict BUSCO-mediated training unit testing CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 40 --species Awesome busco

[Oct 21 10:38 PM]: OS: Ubuntu 20.04, 48 cores, ~ 264 GB RAM. Python: 3.8.19 [Oct 21 10:38 PM]: Running funannotate v1.8.17 [Oct 21 10:38 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 21 10:38 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 21 10:38 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco glimmerhmm busco snap busco [Oct 21 10:38 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 21 10:38 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked /home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-p2g.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html from pkg_resources import parse_version [Oct 21 10:38 PM]: Mapping 1,065 proteins to genome using diamond and exonerate [Oct 21 10:38 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00 Progress: 1505 complete, 0 failed, 0 remaining [Oct 21 10:38 PM]: Exonerate finished in 0:00:11: found 1,272 alignments [Oct 21 10:38 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 21 10:40 PM]: 370 valid BUSCO predictions found, validating protein sequences [Oct 21 10:41 PM]: 202 BUSCO predictions validated [Oct 21 10:41 PM]: Training Augustus using BUSCO gene models Error: In sequence CP022970.1_48453-52721: One CDS exon does not begin properly after the previous CDS exon.602 >= 600 GBProcessor::getGeneList(): Intron has non-positive length. Encountered error after reading 0 annotations.

...this then continues for multiple instances of the above error...

augustus: ERROR No genbank sequences found.

Traceback (most recent call last): File "/home/an351485/bin/miniconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 717, in main mod.main(arguments) File "/home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/predict.py", line 2094, in main lib.trainAugustus( File "/home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 10971, in trainAugustus train_results = getTrainResults( File "/home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 10708, in getTrainResults float(values1[1]), UnboundLocalError: local variable 'values1' referenced before assignment ######################################################### Traceback (most recent call last): File "/home/an351485/bin/miniconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 717, in main mod.main(arguments) File "/home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 407, in main runBuscoTest(args) File "/home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 200, in runBuscoTest assert 1500 <= countGFFgenes(os.path.join( File "/home/an351485/bin/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes with open(input, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'test-busco_cc4bc53d-61d2-4825-b1bd-5dde79eb56b6/annotate/predict_results/Awesome_busco.gff3'

OS/Install Information

Checking dependencies for 1.8.17

You are running Python v 3.8.19. Now checking python packages... biopython: 1.76 goatools: 1.4.12 matplotlib: 3.7.3 natsort: 8.4.0 numpy: 1.24.4 pandas: 2.0.3 psutil: 5.7.0 requests: 2.32.3 scikit-learn: 1.3.2 scipy: 1.10.1 seaborn: 0.13.2 All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.858 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.58 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.03 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.17 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/home/an351485/bin/miniconda3/envs/funannotate/funannotate_db/ $PASAHOME=/home/an351485/bin/miniconda3/envs/funannotate/opt/pasa-2.5.3 $TRINITY_HOME=/home/an351485/bin/miniconda3/envs/funannotate/opt/trinity-2.15.2 $EVM_HOME=/home/an351485/bin/miniconda3/envs/funannotate/opt/evidencemodeler-2.1.0 $AUGUSTUS_CONFIG_PATH=/home/an351485/bin/miniconda3/envs/funannotate/config/ ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.5.3 CodingQuarry: 2.0 Trinity: 2.15.2 augustus: 3.5.0 bamtools: bamtools 2.5.2 bedtools: bedtools v2.31.1 blat: BLAT v39x1 diamond: 2.1.10 emapper.py: 2.1.12 ete3: 3.1.3 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2024-10-10 hisat2: 2.2.1 hmmscan: HMMER 3.4 (Aug 2023) hmmsearch: HMMER 3.4 (Aug 2023) java: 22.0.1-internal kallisto: 0.46.1 mafft: v7.526 (2024/Apr/26) makeblastdb: makeblastdb 2.16.0+ minimap2: 2.28-r1209 pigz: 2.8 proteinortho: 6.3.2 pslCDnaFilter: no way to determine salmon: salmon 1.10.3 samtools: samtools 1.21 snap: 2006-07-28 stringtie: 2.2.3 tRNAscan-SE: 2.0.12 (Nov 2022) tantan: tantan 50 tbl2asn: 25.8 tblastn: tblastn 2.16.0+ trimal: trimAl v1.5.rev0 build[2024-05-27] trimmomatic: 0.39 ERROR: gmes_petap.pl not installed ERROR: signalp not installed

Note that I'm not interested in using GeneMark-ES, nor SignalP, for the moment, so ignoring those errors for the time being (should still complete without them).

ceneg commented 1 day ago

I did a fresh install with Funannotate as well. When doing a "funannotate test -t all --cpus 60" with both

GBProcessor::getGeneList(): Intron has non-positive length.
Encountered error after reading 0 annotations.

Has there been any development on this issue?

Update: this only happens in the "BUSCO-mediated training unit testing", right after Training Augustus using BUSCO gene models

The initial BUSCO prediction seems to run without problems:

[Nov 07 08:34 AM]: Running Augustus gene prediction using saccharomyces parameters
     Progress: 11 complete, 0 failed, 0 remaining        
[Nov 07 08:35 AM]: 1,485 predictions from Augustus
[Nov 07 08:35 AM]: Pulling out high quality Augustus predictions
[Nov 07 08:35 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence)

and running only "funannotate test -t predict" finishes successfully.