nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

Evidence modeler has failed, exiting #913

Closed mpalmada closed 1 year ago

mpalmada commented 1 year ago

Dear @nextgenusfs,

I am trying to run the pipeline and it runs correctly until it arrives at the EVM step. I am not sure, but I have seen an issue that says that having an empty line in a gff3 could make troubles to EVM step (https://github.com/nextgenusfs/funannotate/pull/709). The gff3 files in each partition have several empty lines, I am not sure if the pipeline has a sanity check step before starting EVM regarding this and if this might be the issue of why the pipeline crushes in this step and doesn't continue. I am using the last version, here the funannotate check --show-versions output: funannotate check --show-versions

Checking dependencies for 1.8.15

You are running Python v 3.8.15. Now checking python packages... biopython: 1.76 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.3.1 numpy: 1.24.3 pandas: 2.0.1 psutil: 5.9.5 requests: 2.29.0 scikit-learn: 1.2.2 scipy: 1.10.1 seaborn: 0.12.2 All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $PASAHOME=/X/miniconda3/envs/funannotate/opt/pasa-2.5.2 $TRINITY_HOME=/X/miniconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/X/miniconda3/envs/funannotate/opt/evidencemodeler-2.1.0 $AUGUSTUS_CONFIG_PATH=/X/miniconda3/envs/funannotate/config/ ERROR: FUNANNOTATE_DB not set. export FUNANNOTATE_DB=/path/to/dir ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.8 emapper.py: 2.1.3 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2021-08-25 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.520 (2023/Mar/22) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.25-r1173 pigz: 2.6 proteinortho: 6.2.3 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.17 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.11 (Oct 2022) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: gmes_petap.pl not installed ERROR: signalp not installed

The error file says this:

[X AM]: Running SNAP gene prediction, using training data: X_funannotate/predict_misc/busco.final.gff3 [X AM]: 0 predictions from SNAP [X AM]: SNAP prediction failed, moving on without result [X AM]: Running GlimmerHMM gene prediction, using training data: X_funannotate/predict_misc/busco.final.gff3 [X AM]: 57,211 predictions from GlimmerHMM [X AM]: Summary of gene models passed to EVM (weights): [X AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [X 04:00 AM]: Converting to GFF3 and collecting all EVM results Progress: 1663 complete, 0 failed, 0 remaining
Source Weight Count Augustus 1 29380 Augustus HiQ 2 384
GlimmerHMM 1 57211 Total - 86975 [X 04:00 AM]: Evidence modeler has failed, exiting

And the command is:

funannotate predict -i ${fastain} -o ${out} --species "${species}" --augustus_species "${sp_ref}"

Hope you can help me,

Thank you!

Sincerely,

Marc

nextgenusfs commented 1 year ago

Please run funannotate test -t predict to determine if an install problem or an issue with your data.

mpalmada commented 1 year ago

Hi @nextgenusfs,

The test is not working:

funannotate test -t predict ######################################################### Running funannotate predict unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 2 --species Awesome testicus #########################################################

[May 16 09:18 AM]: OS: CentOS Linux 7, 128 cores, ~ 2101 GB RAM. Python: 3.8.15 [May 16 09:18 AM]: Running funannotate v1.8.15 [May 16 09:18 AM]: Funannotate database not properly configured, run funannotate setup. ######################################################### Traceback (most recent call last): File "/X/miniconda3/envs/funannotate/bin/funannotate", line 8, in sys.exit(main()) File "/X/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/X/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 405, in main runPredictTest(args) File "/X/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 160, in runPredictTest assert 1500 <= countGFFgenes(os.path.join( File "/X/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes with open(input, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_e259d2a3-be73-42cf-9fc8-5534fc74d022/annotate/predict_results/Awesome_testicus.gff3'

Any ideas?

Marc

nextgenusfs commented 1 year ago

[May 16 09:18 AM]: Funannotate database not properly configured, run funannotate setup.

mpalmada commented 1 year ago

Hi,

funannotate setup --update

[May 16 03:55 PM]: OS: CentOS Linux 7, 128 cores, ~ 1057 GB RAM. Python: 3.8.15 [May 16 03:55 PM]: Running 1.8.15 [May 16 03:55 PM]: Database location: /home/groups/compgen/mpalmada/funannotate_db [May 16 03:55 PM]: Retrieving download links from GitHub Repo [May 16 03:55 PM]: Checking for newer versions of database files [May 16 03:55 PM]: Parsing Augustus pre-trained species and porting to funannotate [May 16 03:55 PM]: merops database is current. [May 16 03:55 PM]: MEROPS Database: version=12.0 date=2017-10-04 records=5,009 [May 16 03:55 PM]: uniprot-release database is current. [May 16 03:55 PM]: UniProtKB Database: version=2023_02 date=2023-05-03 records=569,516 [May 16 03:55 PM]: dbCAN database is current. [May 16 03:55 PM]: dbCAN Database: version=11.0 date=2022-08-09 records=699 [May 16 03:55 PM]: pfam-log database is current. [May 16 03:55 PM]: Pfam Database: version=35.0 date=2021-11 records=19,632 [May 16 03:55 PM]: repeats database is current. [May 16 03:55 PM]: Repeat Database: version=1.0 date=2023-05-16 records=11,950 [May 16 03:55 PM]: go-obo database is current. [May 16 03:55 PM]: GO ontology version=2023-04-01 date=2023-04-01 records=47,497 [May 16 03:55 PM]: mibig database is current. [May 16 03:55 PM]: MiBIG Database: version=1.4 date=2023-05-16 records=31,023 [May 16 03:56 PM]: interpro database is current. [May 16 03:56 PM]: InterProScan XML: version=94.0 date=2023-05-10 records=38,816 [May 16 03:56 PM]: outgroups not found in database [May 16 03:56 PM]: Downloading pre-computed BUSCO outgroups [May 16 03:56 PM]: Downloading: https://osf.io/r9sne/download?version=1 Bytes: 2374032 [May 16 03:56 PM]: BUSCO outgroups: version=1.0 date=2023-05-16 records=8 [May 16 03:56 PM]: gene2product database is current. [May 16 03:56 PM]: Gene2Product: version=1.88 date=2023-02-14 records=34,365 [May 16 03:56 PM]: Downloading busco models: dikarya

If I run the test again I get the same error output. Should I run another "funannotate setup" command?

Thanks,

Marc

nextgenusfs commented 1 year ago

It looks like its there, but the error in predict is saying that files are missing from the database, ensure you have the ENV variable FUNANNOTATE_DB pointing to the proper directory? It seems that you do since you didn't specify the -d option for funannotate setup. If error persists, I'd manually remove the whole FUNANNOTATE_DB directory and try to run it again.

mpalmada commented 1 year ago

Hi! I changed the evidence modeler perl file to the one from its github (which was a larger file) and then it worked. I don't know if the script I got with the conda environment is truncated. Thanks!

sunnycqcn commented 1 year ago

I met the same problem. I think the error is evidencemodeler version. It does not support 2.1.0.

lbwljj commented 5 months ago

I agree, I used evincemodeler 1.1.1 and the problem was resolved.