Closed Hmariewilith closed 3 years ago
InterPro changed the format of their XML file which is causing the parsing error. So this had to be changed. It is fixed in the master branch, along with a lot of other changes (python 3 support). You can install the master using pip in your current environment and then re-run setup to update the databases. Hopefully that fixes the initial problem. If you encounter any error in the run please let me know and I will fix -- code under went a lot of changes to support py3 and be backwards compatible with py2 - there may be some other issues remaining although the tests are passing.
Thanks so much, and thanks for your quick response! Trying the new code (which fixes another issue I was having with python environments), so thanks heaps!
@Hmariewilith great! Please let me know if you run into any other problems -- also helpful to just drop a note and say that all worked great. I need to tag a new release sometime soon as this issue with the database setup is going to happen to everybody doing a new install.
@nextgenusfs just letting you know the update fixed the issue and we have managed to run the tool a couple times. Thanks for developing this, awesome tool for fungal genomics :)
Hi,
I am trying to run funannotate predict on a bipolaris species (dothideomycetes, ascomycota).
I have installed funannoate and set all dependencies as below:
$ funannotate check --show-versions
Checking dependencies for 1.7.4
You are running Python v 2.7.15. Now checking python packages... biopython: 1.68 goatools: 1.0.6 matplotlib: 2.2.5 natsort: 6.2.0 numpy: 1.16.5 pandas: 0.24.2 psutil: 5.7.0 requests: 2.24.0 scikit-learn: 0.20.3 scipy: 1.2.1 seaborn: 0.9.0 All 11 python packages installed
You are running Perl v 5.026002. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.852 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed
Checking Environmental Variables... $FUNANNOTATE_DB=/home/meganmcdonald/funannotate_db $PASAHOME=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/opt/pasa-2.4.1 $TRINITY_HOME=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/opt/trinity-2.8.5 $EVM_HOME=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/config/ $GENEMARK_PATH=/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/genmark/gmes_linux_64/ All 6 environmental variables are set
Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.29.2 blat: BLAT v36 diamond: 0.9.21 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hmmscan: HMMER 3.3 (Nov 2019) hmmsearch: HMMER 3.3 (Nov 2019) java: 11.0.1-internal kallisto: 0.46.2 mafft: v7.471 (2020/Jul/3) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.17-r941 proteinortho: 6.0.18 pslCDnaFilter: no way to determine salmon: salmon 0.15.0 samtools: samtools 1.9 snap: 2006-07-28 stringtie: 2.1.2 tRNAscan-SE: 2.0.5 (October 2019) tantan: tantan 13 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: ete3 not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed
I do not believe that the external dependencies which are missing are of concern in this instance as we are conducting train and predict only.
I have run my funannotate train command in a Snakemake file, the exact command being:
"funannotate train -i WAI2411_mask.fasta " "-o WAI2411_fun_train_output " "-s All-CS10_unpaired.fastq.gz " "--max_intronlen 500 " "--species 'Bipolaris sorokinana' " "--isolate WAI2411 --no_trimmomatic " "--cpus 12 --memory 35G"
This has completed without error. These are the final lines of the train log: "" [07/16/20 21:52:56]: Parsing expression value results. Keeping best transcript at each locus. [07/16/20 21:54:23]: Wrote 10,318 PASA gene models [07/16/20 21:54:23]: PASA database name: Bipolaris_sorokinana_WAI2411 [07/16/20 21:54:23]: Trinity/PASA has completed, you are now ready to run funanotate predict, for example:
funannotate predict -i WAI2411_mask.fasta \ -o WAI2411_fun_train_output -s "Bipolaris sorokinana" --isolate WAI2411 --cpus 12 ""
However, I keep receiving the following command - error when trying to run predict:
$ funannotate predict -i Fasta_files/WAI2411_mask.fasta -o WAI2411_fun_train_output -s 'Bipolaris sorokinana' --isolate WAI2411 --cpus 12 --busco_db ascomycota
[02:48 PM]: OS: linux2, 12 cores, ~ 37 GB RAM. Python: 2.7.15 [02:48 PM]: Running funannotate v1.7.4 [02:48 PM]: ERROR: ascomycota busco database is not found, install with funannotate setup -b ascomycota
I then go to setup ascomycote and receive this error:
$ funannotate setup -b ascomycota
[02:50 PM]: OS: linux2, 12 cores, ~ 37 GB RAM. Python: 2.7.15 [02:50 PM]: Running 1.7.4 [02:50 PM]: Database location: /home/meganmcdonald/funannotate_db [02:50 PM]: Parsing Augustus pre-trained species and porting to funannotate [02:50 PM]: MEROPS Database: version=12.0 date=2017-10-04 records=5,009 [02:50 PM]: UniProtKB Database: version=2020_03 date=2020-06-17 records=562,755 Traceback (most recent call last): File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 660, in
main()
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 650, in main
mod.main(arguments)
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/setupDB.py", line 654, in main
dbCANDB(DatabaseInfo, args.force, args=args)
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/setupDB.py", line 228, in dbCANDB
type, name, version, date, records, checksum = info.get('dbCAN')
TypeError: 'NoneType' object is not iterable
If I direct the command to the correct location using -d path/to/directory (which is already setup under $FUNANNOTATE_DB), i get the same error.
I then used "-f" to redo my funannotate setup to ensure everything was installed correctly, see below output for "funannotate database":
Funannotate Databases currently installed:
Database Type Version Date Num_Records Md5checksum
pfam hmmer3 33.1 2020-04 18259 228db640818af54066a1c23404a3ba38 gene2product text 1.62 2020-05-18 33289 04026bbca965378b2d7682c141a839f7 busco_outgroups outgroups 1.0 2020-07-21 8 6795b1d4545850a4226829c7ae8ef058 merops diamond 12.0 2017-10-04 5009 a6dd76907896708f3ca5335f58560356 mibig diamond 1.4 2020-07-21 31023 118f2c11edde36c81bdea030a0228492 uniprot diamond 2020_03 2020-06-17 562755 a4c8f55dab78451be23a3820fa2aed0d go text 2020-06-01 2020-06-01 47358 e5a9693191fea3019ce5f00dd1da4ae2 repeats diamond 1.0 2020-07-21 11950 4e8cafc3eea47ec7ba505bb1e3465d21
The package "interpro" will not install, and throws back the same error as the installation of the databases:
$ funannotate setup -i interpro -b fungi -d /home/meganmcdonald/funannotate_db -f
[03:14 PM]: OS: linux2, 12 cores, ~ 37 GB RAM. Python: 2.7.15 [03:14 PM]: Running 1.7.4 [03:14 PM]: Database location: /home/meganmcdonald/funannotate_db [03:14 PM]: Parsing Augustus pre-trained species and porting to funannotate [03:14 PM]: Downloading InterProScan Mapping file [03:14 PM]: Downloading: ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml.gz Bytes: 26189725 Traceback (most recent call last): File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 660, in
main()
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 650, in main
mod.main(arguments)
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/setupDB.py", line 662, in main
interproDB(DatabaseInfo, args.force, args=args)
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/setupDB.py", line 432, in interproDB
iprdate, "%d-%b-%y").strftime("%Y-%m-%d")
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/_strptime.py", line 335, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: 20
I checked funannotate test -t all --cpus 8 with the same error:
######################################################### Running
funannotate predict
unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 8 --species Awesome testicus #########################################################[03:22 PM]: OS: linux2, 12 cores, ~ 37 GB RAM. Python: 2.7.15 [03:22 PM]: Running funannotate v1.7.4 [03:22 PM]: ERROR: dikarya busco database is not found, install with funannotate setup -b dikarya ######################################################### Traceback (most recent call last): File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 660, in
main()
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/bin/funannotate", line 650, in main
mod.main(arguments)
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/test.py", line 389, in main
runPredictTest(args)
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/test.py", line 152, in runPredictTest
tmpdir, 'annotate', 'predict_results', 'Awesome_testicus.gff3')) <= 1800
File "/home/meganmcdonald/Desktop/Hannahsstuff/Funanno_train/.snakemake/conda/d7104746/lib/python2.7/site-packages/funannotate/
test.py", line 40, in countGFFgenes with open(input, 'rU') as f: IOError: [Errno 2] No such file or directory: 'test-predict_4887/annotate/predict_results/Awesome_testicus.gff3'
Can you please advise if there is something that I have missed which could explain the errors I am receiving and any explanation as to how to overcome them. For your reference, the two files which appear to be throwing the errors are attached.
setupDB.txt
funannotate.txt