nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

BUSCO training of Augusus failed #229

Closed aberaslop closed 5 years ago

aberaslop commented 6 years ago

Hi, I have installed funannotate with the docker option, as I could not make the Hombrew installation correctly. All databases were installed correctly with "funannotate setup" except for dbCAN because the website was down. I ended up downloading it from here: http://cys.bios.niu.edu/dbCAN

Then, I run "funannotate predict", which ran smoothly for several hours until it encountered an error:

BUSCO training of Augusus failed, check busco logs, exiting

The log file (attached) says that Busco cannot read the database dikarya.

After double checking the databases that got installed, I don't see any called dikarya. The Busco database unzipped into a folder called "outgroups". Inside there are a few files (_.dikarya_buscos.fa). I have checked online for busco databases called dikarya and I have found this one: http://busco.ezlab.org/v2/datasets/dikarya_odb9.tar.gz. I have downloaded it, unzipped it and stored it in the database folder, but it is still not working.

Does anybody have an idea of what may be going on?

Thank you!

busco.log


Checking dependencies for funannotate v1.5.0

You are running Python v 2.7.14. Now checking python packages... biopython: 1.68 goatools: 0.8.4 matplotlib: 2.2.2 natsort: 5.3.3 numpy: 1.15.0 pandas: 0.23.4 psutil: 5.4.6 requests: 2.19.1 scikit-learn: 0.19.2 scipy: 1.1.0 seaborn: 0.9.0 All 11 python packages installed

You are running Perl v 5.026002. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.39 DBD::SQLite: 1.58 DBD::mysql: 4.046 DBI: 1.641 DB_File: 1.835 Data::Dumper: 2.167 File::Basename: 2.85 File::Which: 1.20 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 2.97001 LWP::UserAgent: 6.35 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 1.20 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.11 Text::Soundex: 3.05 Thread::Queue: 3.13 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.26 threads: 2.21 threads::shared: 1.58 All 27 Perl modules installed

Checking external dependencies... CodingQuarry: 2.0 RepeatMasker: RepeatMasker 4.0.7 RepeatModeler: RepeatModeler 1.0.11 Trinity: 2.6.6 augustus: 3.3.1 bamtools: bamtools 2.5.0 bedtools: bedtools v2.27.1 blat: BLAT v36 diamond: diamond 0.9.21 ete3: 3.1.1 exonerate: exonerate 2.4.0 fasta: no way to determine gmap: 2017-06-20 gmes_petap.pl: 4.33 hisat2: 2.1.0 hmmscan: HMMER 3.2 (June 2018) hmmsearch: HMMER 3.2 (June 2018) java: 1.8.0_171 kallisto: 0.44.0 mafft: /home/linuxbrew/conda/bin/ete3_apps/bin/mafft: Cannot open --version.

MAFFT v6.861b (2011/09/24) http://mafft.cbrc.jp/alignment/software/ NAR 30:3059-3066 (2002), Briefings in Bioinformatics 9:286-298 (2008)

High speed: % mafft in > out % mafft --retree 1 in > out (fast)

High accuracy (for <~200 sequences x <~2,000 aa/nt): % mafft --maxiterate 1000 --localpair in > out (% linsi in > out is also ok) % mafft --maxiterate 1000 --genafpair in > out (% einsi in > out) % mafft --maxiterate 1000 --globalpair in > out (% ginsi in > out)

If unsure which option to use: % mafft --auto in > out

--op # : Gap opening penalty, default: 1.53 --ep # : Offset (works like gap extension penalty), default: 0.0 --maxiterate # : Maximum number of iterative refinement, default: 0 --clustalout : Output: clustal format, default: fasta --reorder : Outorder: aligned, default: input order --quiet : Do not report progress --thread # : Number of threads. (# must be <= number of physical cores - 1) makeblastdb: makeblastdb 2.7.1+ minimap2: 2.12-r827 nucmer: 3.1 pslCDnaFilter: no way to determine rmblastn: rmblastn 2.2.27+ samtools: samtools 1.9 stringtie: 1.3.4d tRNAscan-SE: 2.0 (December 2017) tbl2asn: unknown, likely 25.3 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] ERROR: emapper.py not installed Checking Environmental Variables... $FUNANNOTATE_DB=/home/linuxbrew/data/databases $PASAHOME=/home/linuxbrew/PASApipeline $TRINITYHOME=/home/linuxbrew/Trinity $EVM_HOME=/home/linuxbrew/evidencemodeler $AUGUSTUS_CONFIG_PATH=/home/linuxbrew/augustus/config $GENEMARK_PATH=/home/linuxbrew/gm_et_linux_64/gmes_petap $BAMTOOLS_PATH=/usr/local/bin All 7 environmental variables are set

nextgenusfs commented 6 years ago

what is in your database folder?

$ ls $FUNANNOTATE_DB
busco_outgroups.tar.gz  dbCAN.hmm.h3p                          go.obo               ncbi_cleaned_gene_products.txt  Pfam-A.hmm.h3p
dbCAN.changelog.txt     dikarya                                interpro.xml         outgroups                       Pfam.version
dbCAN-fam-HMMs.txt      dikarya.tar.gz                         merops.dmnd          Pfam-A.clans.tsv                repeats.dmnd
dbCAN.hmm               funannotate-db-info.txt                merops.formatted.fa  Pfam-A.hmm                      uniprot.dmnd
dbCAN.hmm.h3f           funannotate.repeat.proteins.fa         merops_scan.lib      Pfam-A.hmm.h3f                  uniprot.release-date.txt
dbCAN.hmm.h3i           funannotate.repeat.proteins.fa.tar.gz  mibig.dmnd           Pfam-A.hmm.h3i                  uniprot_sprot.fasta
dbCAN.hmm.h3m           funannotate.repeats.reformat.fa        mibig.fa             Pfam-A.hmm.h3m

And funannotate database should look something like this (although your dbCAN may not show up correctly?):

$ funannotate database

Funannotate Databases currently installed:

  Database          Type        Version      Date         Num_Records   Md5checksum
  pfam              hmmer3      31.0         2017-02            16712   3e47eec766a99b599cb24f28c4d393f8
  gene2product      text        1.17         2018-07-21         25940   3becb19708f4ba29878d4dce0c560c06
  interpro          xml         69.0         2018-06-21         34358   55f4f8d7c3d25eab20055a0fa0d2c3c3
  dbCAN             hmmer3      6.0          2017-09-12           585   3cb06f6f93c72a56c9fa12a6294b41d5
  busco_outgroups   outgroups   1.0          2018-08-15             8   6795b1d4545850a4226829c7ae8ef058
  merops            diamond     12.0         2017-10-04          4968   d923f0177c6d27c3d2886c705347adc0
  mibig             diamond     1.4          2018-08-15         31023   118f2c11edde36c81bdea030a0228492
  uniprot           diamond     2018_07      2018-07-18        557992   068c01f7ad636fb756eb4ffa09c85fbf
  go                text        2018-08-13   2018-08-13         47289   d0d96bed97a008c1e69acb785c335160
  repeats           diamond     1.0          2018-08-15         11950   4e8cafc3eea47ec7ba505bb1e3465d21

To update a database type:
        funannotate setup -i DBNAME -d /data/share/jon/funannotate_db --force
aberaslop commented 6 years ago

Hi nextgenusfs, thank you for your answer.

My database folder looks like this: FamInfo.txt Pfam-A.clans.tsv Pfam-A.hmm Pfam-A.hmm.h3f Pfam-A.hmm.h3i Pfam-A.hmm.h3m Pfam-A.hmm.h3p Pfam.version aspergillus_nidulans.dikarya_buscos.fa botrytis_cinerea.dikarya_buscos.fa busco_outgroups.tar.gz coprinopsis_cinerea.dikarya_buscos.fa dbCAN-fam-HMMs.txt dikarya dikarya_odb9.tar.gz funannotate-db-info.txt funannotate.repeat.proteins.fa funannotate.repeat.proteins.fa.tar.gz funannotate.repeats.reformat.fa go.obo interpro.xml laccaria_bicolor.dikarya_buscos.fa merops.dmnd merops.formatted.fa merops_scan.lib mibig.dmnd mibig.fa ncbi_cleaned_gene_products.txt outgroups readme.txt repeats.dmnd saccharomyces_cerevisiae.dikarya_buscos.fa schizosaccharomyces_pombe.dikarya_buscos.fa uniprot.dmnd uniprot.release-date.txt uniprot_sprot.fasta

And the funannotate database looks like this: Funannotate Databases currently installed:

Database Type Version Date Num_Records Md5checksum
pfam hmmer3 32.0 2018-08 17929 de7496fad69c1040fd74db1cb5eef0fc gene2product text 1.21 2018-10-06 26028 b38b123344ea3fd5027f5737e6f40f9c interpro xml 70.0 2018-09-13 35020 06098dd955e4dafdbfdda2fdc33dd68a busco_outgroups outgroups 1.0 2018-10-24 8 6795b1d4545850a4226829c7ae8ef058 mibig diamond 1.4 2018-10-23 31023 118f2c11edde36c81bdea030a0228492 go text 2018-10-19 2018-10-19 47334 4967967466691f12bdfbb18ac1599a4c repeats diamond 1.0 2018-10-23 11950 4e8cafc3eea47ec7ba505bb1e3465d21

To update a database type: funannotate setup -i DBNAME -d /home/linuxbrew/data/databases --force

I just noticed that the uniprot database does not appear in this second message, but it is in the folder...

Thanks!

nextgenusfs commented 6 years ago

Probably because dbCAN stalled out, perhaps rest of database didn't install correctly. Can you try to remove the busco folder and then run setup as follows:

rm -r $FUNANNOTATE_DB/dikarya

#re-run setup
funannotate setup -f -i merops uniprot pfam repeats go mibig interpro busco_outgroups gene2product -b dikarya

This should force download install everything except dbCAN (which you manually installed).

nextgenusfs commented 6 years ago

V1.5.1 just released with updated dbCAN link -- hopefully a fresh install will solve this issue.

aberaslop commented 6 years ago

Thank you!

On Fri, Oct 26, 2018 at 1:07 AM Jon Palmer notifications@github.com wrote:

V1.5.1 just released with updated dbCAN link -- hopefully a fresh install will solve this issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/229#issuecomment-433287932, or mute the thread https://github.com/notifications/unsubscribe-auth/Aoa0tLGy3C7skM-KLmcTimNH5KzgQHiHks5uopimgaJpZM4X3VD_ .