nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 87 forks source link

Problems in funannotate predict with AUGUSTUS_CONFIG_PATH #899

Open DiegoSafian opened 1 year ago

DiegoSafian commented 1 year ago

Are you using the latest release? funannotate v1.8.13

Describe the bug Training Augustus using BUSCO gene models showed an error, which seems to be related with the AUGUSTUS_CONFIG_PATH

What command did you issue? funannotate predict --species reticulata_predict \ --input ./genome.fa \ --out reticulata_predict \ --transcript_evidence ./Trinity-GG.fasta \ --rna_bam ./RNA_alignmentAligned.sortedByCoord.out.bam \ --protein_evidence./proteinsfishes.fasta \ --other_gff ./file.gff3 \ --busco_db actinopterygii \ --organism other \ --max_intronlen 10000 \ --busco_seed_species zebrafish \ --optimize_augustus \ --repeats2evm \ --cpus 16 \ --AUGUSTUS_CONFIG_PATH=/path/d/bin/augustus_config_system \ --GENEMARK_PATH=/path/bin/gmes_linux_64_mod \ --EVM_HOME=/path/bin/EVidenceModeler-v2.1.0

Logfiles fun_predic.txt augustus.log

OS/Install Information

Checking dependencies for 1.8.13

You are running Python v 3.8.15. Now checking python packages... biopython: 1.80 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.2.0 numpy: 1.24.1 pandas: 1.5.3 psutil: 5.9.4 requests: 2.28.2 scikit-learn: 1.2.1 scipy: 1.10.0 seaborn: 0.12.2 All 11 python packages installed You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.046 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 threads: 2.25 threads::shared: 1.61 ERROR: local::lib not installed, install with cpanm local::lib Checking Environmental Variables... $FUNANNOTATE_DB=/camp/home/safiand/home/users/safiand/funannotate_db $PASAHOME=/camp/home/safiand/home/users/safiand/.conda/envs/funannotate/opt/pasa-2.5.2 $TRINITY_HOME=/camp/home/safiand/home/users/safiand/.conda/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/camp/home/safiand/home/users/safiand/.conda/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/camp/home/safiand/home/users/safiand/.conda/envs/funannotate/config/ $GENEMARK_PATH=/camp/home/safiand/home/users/safiand/bin/gmes_linux_64_mod

Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.515 (2023/Jan/15) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 proteinortho: 6.1.7 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.16.1 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.11 (Oct 2022) tantan: tantan 40 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39

nextgenusfs commented 1 year ago

I think this was fixed, try to update to latest release with pip in your current environment:

python -m pip install "funannotate==1.8.15" --upgrade --force --no-deps
DiegoSafian commented 1 year ago

Hi, After updating funannotate as you said, I still have same issue. "UnboundLocalError: local variable 'AUGUSTUS_BASE' referenced before assignment" . "new_species.pl --AUGUSTUS_CONFIG_PATH=/nemo/lab/cardoso-moreiam/home/users/safiand/genome_annotation/reticulata/funannotate/reticulata_predict/predict_misc/ab_initio_parameters/augustus/ --species=BUSCO_reticulata_predict_2779326087 Could not locate command line parameters file: /nemo/lab/cardoso-moreiam/home/users/safiand/genome_annotation/reticulata/funannotate/reticulata_predict/predict_misc/ab_initio_parameters/augustus/parameters/aug_cmdln_parameters.json.

etraining --species=BUSCO_reticulata_predict_2779326087 /nemo/lab/cardoso-moreiam/home/users/safiand/genome_annotation/reticulata/funannotate/reticulata_predict/predict_misc/busco/run_reticulata_predict/augustus_output/training_set_reticulata_predict.txt

augustus: ERROR /nemo/lab/cardoso-moreiam/home/users/safiand/genome_annotation/reticulata/funannotate/reticulata_predict/predict_misc/ab_initio_parameters/augustus/topCodonExcludedFromCDS=False/ is not a directory. Could not locate directory AUGUSTUS_CONFIG_PATH. " It is a shame because I really wanted to compare BRAKER results with the funannotate pipeline as BRAKER is giving me too short gene models (10kb in eukaryote)

nextgenusfs commented 1 year ago

So what is in your $AUGUSTUS_CONFIG_PATH? It is expecting standard augustus folder structure, ie:

$ ls -l $AUGUSTUS_CONFIG_PATH
total 0
drwxr-xr-x    9 jon  staff   288B Jul  4  2022 cgp/
drwxr-xr-x   14 jon  staff   448B Jul  4  2022 extrinsic/
drwxr-xr-x   27 jon  staff   864B Jul  4  2022 model/
drwxr-xr-x    4 jon  staff   128B Jul  4  2022 parameters/
drwxr-xr-x    5 jon  staff   160B Jul  4  2022 profile/
drwxr-xr-x  171 jon  staff   5.3K Sep  5  2022 species/

It then also expects that you haven't installed this somewhere else, ie $AUGUSTUS_BASE is referring to one directory up from $AUGUSTUS_CONFIG_PATH where the scripts folder is located.

$ ls -l $AUGUSTUS_CONFIG_PATH/../
total 312
-rw-r--r--    1 jon  staff   1.7K Jul  4  2022 Dockerfile
-rw-r--r--@   1 jon  staff   2.4K Jul  4  2022 Makefile
-rw-r--r--    1 jon  staff   4.0K Jul  4  2022 README.md
-rw-r--r--    1 jon  staff   2.2K Jul  4  2022 Singularity.def
drwxr-xr-x   13 jon  staff   416B Jul  4  2022 auxprogs/
drwxr-xr-x   15 jon  staff   480B Oct 18 23:11 bin/
-rw-r--r--@   1 jon  staff   2.9K Jul  4  2022 common.mk
drwxr-xr-x    9 jon  staff   288B Sep  5  2022 config/
drwxr-xr-x   31 jon  staff   992B Oct 18 23:08 docs/
-rw-r--r--    1 jon  staff   107K Jul  4  2022 doxygen.conf
drwxr-xr-x   15 jon  staff   480B Jul  4  2022 examples/
drwxr-xr-x   55 jon  staff   1.7K Oct 18 23:08 include/
drwxr-xr-x   33 jon  staff   1.0K Jul  4  2022 mansrc/
-rw-r--r--    1 jon  staff    27K Jul  4  2022 retraining.html
drwxr-xr-x  113 jon  staff   3.5K Oct 18 23:08 scripts/
drwxr-xr-x  115 jon  staff   3.6K Oct 18 23:10 src/
drwxr-xr-x    5 jon  staff   160B Jul  4  2022 tests/

The only bit that is important here is the scripts folder, as funannotate needs to be able to find some of those scripts if they are not installed in your $PATH. So you either need to ensure that the augustus scripts directory is in your $PATH, or setup your augustus install as recommended by augustus developers.

nextgenusfs commented 1 year ago

Also, EvidenceModeler v2.1.0 is not supported, so you'll need to downgrade that to v1.1.1. I just noticed a few days ago that the command line options have changed in v2.1.0.

DiegoSafian commented 1 year ago

Hi, Thanks a lot. By including the script fo Augustus in the PATH, funannotate predict has been running fine and my issue seems to be resolved as Augustus in at the moment being trained. I hope I can get longer gene models. Thanks again, Diego

DiegoSafian commented 1 year ago

Hi, I am loving this pipeline. It produces better gene models and a more complete annotation. Thanks!