nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

Error in predict test run #457

Closed Hmariewilith closed 3 years ago

Hmariewilith commented 4 years ago

Hi,

I am trying to run funannotate predict on a bipolaris species (dothideomycetes, ascomycota).

I have installed funannoate and set all dependencies as below:

$ funannotate check --show-versions

Checking dependencies for 1.7.4

You are running Python v 2.7.15. Now checking python packages... biopython: 1.68 goatools: 1.0.6 matplotlib: 2.2.5 natsort: 6.2.0 numpy: 1.16.5 pandas: 0.24.2 psutil: 5.7.0 requests: 2.24.0 scikit-learn: 0.20.3 scipy: 1.2.1 seaborn: 0.9.0 All 11 python packages installed

You are running Perl v 5.026002. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.852 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/home/meganmcdonald/funannotate_db $PASAHOME=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/opt/pasa-2.4.1 $TRINITY_HOME=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/opt/trinity-2.8.5 $EVM_HOME=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/config/ $GENEMARK_PATH=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/genmark/gmes_linux_64/ All 6 environmental variables are set

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.29.2 blat: BLAT v36 diamond: 0.9.21 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hmmscan: HMMER 3.3 (Nov 2019) hmmsearch: HMMER 3.3 (Nov 2019) java: 11.0.1-internal kallisto: 0.46.2 mafft: v7.471 (2020/Jul/3) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.17-r941 proteinortho: 6.0.18 pslCDnaFilter: no way to determine salmon: salmon 0.15.0 samtools: samtools 1.9 snap: 2006-07-28 stringtie: 2.1.2 tRNAscan-SE: 2.0.5 (October 2019) tantan: tantan 13 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: ete3 not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed

I do not believe that the external dependencies which are missing are of concern in this instance as we are conducting train and predict only.

I have run my funannotate train command in a Snakemake file, the exact command being:

"funannotate train -i WAI2411_mask.fasta " "-o WAI2411_fun_train_output " "-s All-CS10_unpaired.fastq.gz " "--max_intronlen 500 " "--species 'Bipolaris sorokinana' " "--isolate WAI2411 --no_trimmomatic " "--cpus 12 --memory 35G"

This has completed without error. These are the final lines of the train log: "" [07/16/20 21:52:56]: Parsing expression value results. Keeping best transcript at each locus. [07/16/20 21:54:23]: Wrote 10,318 PASA gene models [07/16/20 21:54:23]: PASA database name: Bipolaris_sorokinana_WAI2411 [07/16/20 21:54:23]: Trinity/PASA has completed, you are now ready to run funanotate predict, for example:

funannotate predict -i WAI2411_mask.fasta \ -o WAI2411_fun_train_output -s "Bipolaris sorokinana" --isolate WAI2411 --cpus 12 ""

However, I keep receiving the following command - error when trying to run predict:

$ funannotate predict -i Fasta_files/WAI2411_mask.fasta -o WAI2411_fun_train_output -s 'Bipolaris sorokinana' --isolate WAI2411 --cpus 12 --busco_db ascomycota

[02:48 PM]: OS: linux2, 12 cores, ~ 37 GB RAM. Python: 2.7.15 [02:48 PM]: Running funannotate v1.7.4 [02:48 PM]: ERROR: ascomycota busco database is not found, install with funannotate setup -b ascomycota

I then go to setup ascomycote and receive this error:

$ funannotate setup -b ascomycota

[02:50 PM]: OS: linux2, 12 cores, ~ 37 GB RAM. Python: 2.7.15 [02:50 PM]: Running 1.7.4 [02:50 PM]: Database location: /home/meganmcdonald/funannotate_db [02:50 PM]: Parsing Augustus pre-trained species and porting to funannotate [02:50 PM]: MEROPS Database: version=12.0 date=2017-10-04 records=5,009 [02:50 PM]: UniProtKB Database: version=2020_03 date=2020-06-17 records=562,755 Traceback (most recent call last): File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 660, in main() File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 650, in main mod.main(arguments) File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/setupDB.py", line 654, in main dbCANDB(DatabaseInfo, args.force, args=args) File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/setupDB.py", line 228, in dbCANDB type, name, version, date, records, checksum = info.get('dbCAN') TypeError: 'NoneType' object is not iterable

If I direct the command to the correct location using -d path/to/directory (which is already setup under $FUNANNOTATE_DB), i get the same error.

I then used "-f" to redo my funannotate setup to ensure everything was installed correctly, see below output for "funannotate database":

Funannotate Databases currently installed:

Database Type Version Date Num_Records Md5checksum
pfam hmmer3 33.1 2020-04 18259 228db640818af54066a1c23404a3ba38 gene2product text 1.62 2020-05-18 33289 04026bbca965378b2d7682c141a839f7 busco_outgroups outgroups 1.0 2020-07-21 8 6795b1d4545850a4226829c7ae8ef058 merops diamond 12.0 2017-10-04 5009 a6dd76907896708f3ca5335f58560356 mibig diamond 1.4 2020-07-21 31023 118f2c11edde36c81bdea030a0228492 uniprot diamond 2020_03 2020-06-17 562755 a4c8f55dab78451be23a3820fa2aed0d go text 2020-06-01 2020-06-01 47358 e5a9693191fea3019ce5f00dd1da4ae2 repeats diamond 1.0 2020-07-21 11950 4e8cafc3eea47ec7ba505bb1e3465d21

The package "interpro" will not install, and throws back the same error as the installation of the databases:

$ funannotate setup -i interpro -b fungi -d /home/meganmcdonald/funannotate_db -f

[03:14 PM]: OS: linux2, 12 cores, ~ 37 GB RAM. Python: 2.7.15 [03:14 PM]: Running 1.7.4 [03:14 PM]: Database location: /home/meganmcdonald/funannotate_db [03:14 PM]: Parsing Augustus pre-trained species and porting to funannotate [03:14 PM]: Downloading InterProScan Mapping file [03:14 PM]: Downloading: ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml.gz Bytes: 26189725 Traceback (most recent call last): File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 660, in main() File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 650, in main mod.main(arguments) File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/setupDB.py", line 662, in main interproDB(DatabaseInfo, args.force, args=args) File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/setupDB.py", line 432, in interproDB iprdate, "%d-%b-%y").strftime("%Y-%m-%d") File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/_strptime.py", line 335, in _strptime data_string[found.end():]) ValueError: unconverted data remains: 20

I checked funannotate test -t all --cpus 8 with the same error:

######################################################### Running funannotate predict unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 8 --species Awesome testicus #########################################################

[03:22 PM]: OS: linux2, 12 cores, ~ 37 GB RAM. Python: 2.7.15 [03:22 PM]: Running funannotate v1.7.4 [03:22 PM]: ERROR: dikarya busco database is not found, install with funannotate setup -b dikarya ######################################################### Traceback (most recent call last): File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 660, in main() File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 650, in main mod.main(arguments) File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/test.py", line 389, in main runPredictTest(args) File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/test.py", line 152, in runPredictTest tmpdir, 'annotate', 'predict_results', 'Awesome_testicus.gff3')) <= 1800 File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/

test.py", line 40, in countGFFgenes with open(input, 'rU') as f: IOError: [Errno 2] No such file or directory: 'test-predict_4887/annotate/predict_results/Awesome_testicus.gff3'

Can you please advise if there is something that I have missed which could explain the errors I am receiving and any explanation as to how to overcome them. For your reference, the two files which appear to be throwing the errors are attached.

setupDB.txt

funannotate.txt

nextgenusfs commented 4 years ago

InterPro changed the format of their XML file which is causing the parsing error. So this had to be changed. It is fixed in the master branch, along with a lot of other changes (python 3 support). You can install the master using pip in your current environment and then re-run setup to update the databases. Hopefully that fixes the initial problem. If you encounter any error in the run please let me know and I will fix -- code under went a lot of changes to support py3 and be backwards compatible with py2 - there may be some other issues remaining although the tests are passing.

Hmariewilith commented 4 years ago

Thanks so much, and thanks for your quick response! Trying the new code (which fixes another issue I was having with python environments), so thanks heaps!

nextgenusfs commented 4 years ago

@Hmariewilith great! Please let me know if you run into any other problems -- also helpful to just drop a note and say that all worked great. I need to tag a new release sometime soon as this issue with the database setup is going to happen to everybody doing a new install.

Hmariewilith commented 4 years ago

@nextgenusfs just letting you know the update fixed the issue and we have managed to run the tool a couple times. Thanks for developing this, awesome tool for fungal genomics :)