nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
301 stars 82 forks source link

Updating Busco DB #563

Closed athulmenon closed 3 years ago

athulmenon commented 3 years ago

Hi Jon,

I am trying to annotate a Mite genome using Funannotate. I wanted to update the busco database with "arthropoda", as currently it have only "dikarya". I ran the below command but still the database is not getting updated with arthropoda group.

./funannotate-docker setup -b arthropoda logname: no login name logname: no login name

[Mar 02 11:10 AM]: OS: Debian GNU/Linux 10, 12 cores, ~ 74 GB RAM. Python: 3.7.9 [Mar 02 11:10 AM]: Running 1.8.4 [Mar 02 11:10 AM]: Database location: /opt/databases [Mar 02 11:10 AM]: Retrieving download links from GitHub Repo [Mar 02 11:10 AM]: Parsing Augustus pre-trained species and porting to funannotate [Mar 02 11:10 AM]: MEROPS Database: version=12.0 date=2017-10-04 records=5,009 [Mar 02 11:10 AM]: UniProtKB Database: version=2020_06 date=2020-12-02 records=563,972 [Mar 02 11:10 AM]: dbCAN Database: version=9.0 date=2020-08-04 records=641 [Mar 02 11:10 AM]: Pfam Database: version=33.1 date=2020-04 records=18,259 [Mar 02 11:10 AM]: Repeat Database: version=1.0 date=2021-01-31 records=11,950 [Mar 02 11:10 AM]: GO ontology version=2021-01-01 date=2021-01-01 records=47,198 [Mar 02 11:10 AM]: MiBIG Database: version=1.4 date=2021-01-31 records=31,023 [Mar 02 11:10 AM]: InterProScan XML: version=83.0 date=2020-12-03 records=38,345 [Mar 02 11:10 AM]: BUSCO outgroups: version=1.0 date=2021-01-31 records=8 [Mar 02 11:10 AM]: Gene2Product: version=1.65 date=2020-10-05 records=33,749 [Mar 02 11:10 AM]: Downloading busco models: arthropoda [Mar 02 11:10 AM]: Downloading: https://osf.io/w26ez/download?version=1 Bytes: 43933198

./funannotate-docker database --show-outgroups logname: no login name logname: no login name

BUSCO Outgroups:

saccharomyces_cerevisiae.dikarya coprinopsis_cinerea.dikarya aspergillus_nidulans.dikarya
botrytis_cinerea.dikarya laccaria_bicolor.dikarya schizosaccharomyces_pombe.dikarya

I even ran ./funannotate-docker setup -b arthropoda --update ./funannotate-docker setup -b arthropoda --update logname: no login name logname: no login name

[Mar 02 11:10 AM]: OS: Debian GNU/Linux 10, 12 cores, ~ 74 GB RAM. Python: 3.7.9 [Mar 02 11:10 AM]: Running 1.8.4 [Mar 02 11:10 AM]: Database location: /opt/databases [Mar 02 11:10 AM]: Retrieving download links from GitHub Repo [Mar 02 11:10 AM]: Checking for newer versions of database files [Mar 02 11:10 AM]: Parsing Augustus pre-trained species and porting to funannotate [Mar 02 11:10 AM]: merops database is current. [Mar 02 11:10 AM]: MEROPS Database: version=12.0 date=2017-10-04 records=5,009 [Mar 02 11:10 AM]: uniprot-release database is out of date, updating. [Mar 02 11:10 AM]: Downloading UniProtKB/SwissProt database [Mar 02 11:10 AM]: Downloading: ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz Bytes: 89862102 [Mar 02 11:11 AM]: Downloading: ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt Bytes: 151 [Mar 02 11:11 AM]: Building diamond database [Mar 02 11:12 AM]: UniProtKB Database: version=2021_01 date=2021-02-10 records=564,277 [Mar 02 11:13 AM]: dbCAN database is current. [Mar 02 11:13 AM]: dbCAN Database: version=9.0 date=2020-08-04 records=641 [Mar 02 11:13 AM]: pfam-log database is current. [Mar 02 11:13 AM]: Pfam Database: version=33.1 date=2020-04 records=18,259 [Mar 02 11:13 AM]: repeats database is current. [Mar 02 11:13 AM]: Repeat Database: version=1.0 date=2021-01-31 records=11,950 [Mar 02 11:13 AM]: go-obo database is out of date, updating. [Mar 02 11:13 AM]: Downloading GO Ontology database [Mar 02 11:13 AM]: Downloading: http://purl.obolibrary.org/obo/go.obo Bytes: 33808607 [Mar 02 11:14 AM]: GO ontology version=2021-02-01 date=2021-02-01 records=47,210 [Mar 02 11:14 AM]: mibig database is current. [Mar 02 11:14 AM]: MiBIG Database: version=1.4 date=2021-01-31 records=31,023 [Mar 02 11:15 AM]: interpro database is out of date, updating. [Mar 02 11:15 AM]: Downloading InterProScan Mapping file [Mar 02 11:15 AM]: Downloading: ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml.gz Bytes: 30915442 [Mar 02 11:15 AM]: Downloading: ftp://ftp.ebi.ac.uk/pub/databases/interpro/entry.list Bytes: 2080861 [Mar 02 11:15 AM]: InterProScan XML: version=84.0 date=2021-02-11 records=38,549 [Mar 02 11:15 AM]: outgroups not found in database [Mar 02 11:15 AM]: Downloading pre-computed BUSCO outgroups [Mar 02 11:15 AM]: Downloading: https://osf.io/r9sne/download?version=1 Bytes: 2374032 [Mar 02 11:15 AM]: BUSCO outgroups: version=1.0 date=2021-03-02 records=8 [Mar 02 11:15 AM]: gene2product database is current. [Mar 02 11:15 AM]: Gene2Product: version=1.65 date=2020-10-05 records=33,749 [Mar 02 11:15 AM]: Downloading busco models: arthropoda [Mar 02 11:16 AM]: Downloading: https://osf.io/w26ez/download?version=1 Bytes: 43933198

But still the database is not getting updated. Can you please suggest a way to fix this?

Thanks, Athul

OS/Install Information

Checking dependencies for 1.8.4

You are running Python v 3.7.9. Now checking python packages... biopython: 1.78 goatools: 1.0.15 matplotlib: 3.3.4 natsort: 7.1.1 numpy: 1.19.5 pandas: 1.2.1 psutil: 5.8.0 requests: 2.25.1 scikit-learn: 0.24.1 scipy: 1.5.3 seaborn: 0.11.1 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.15 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/venv/config ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... Traceback (most recent call last): File "/venv/bin/ete3", line 6, in from ete3.tools.ete import main File "/venv/lib/python3.7/site-packages/ete3/tools/ete.py", line 55, in from . import (ete_split, ete_expand, ete_annotate, ete_ncbiquery, ete_view, File "/venv/lib/python3.7/site-packages/ete3/tools/ete_view.py", line 48, in from .. import (Tree, PhyloTree, TextFace, RectFace, faces, TreeStyle, CircleFace, AttrFace, ImportError: cannot import name 'TextFace' from 'ete3' (/venv/lib/python3.7/site-packages/ete3/init.py) PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.6 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.1 (Jul 2020) hmmsearch: HMMER 3.3.1 (Jul 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.475 (2020/Nov/23) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.17-r941 proteinortho: 6.0.16 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.10 snap: 2006-07-28 stringtie: 2.1.4 tRNAscan-SE: 2.0.7 (Oct 2020) tantan: tantan 13 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: ete3 not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed

nextgenusfs commented 3 years ago

So the issue here is docker.... when you run the wrapper funannotate-docker it uses the image runs that command and then exits. So running it this way will not save the container. Two options would be for me to add all the databases to the build image, which is probably the right solution even though it will increase image size. Otherwise, you would need to launch the image with docker and keep the container running, and then access/run funannotate jobs on that container with docker exec.

nextgenusfs commented 3 years ago

I added a few more BUSCO groups to the docker image including Arthropoda so you should just need to do docker pull to get the updated image.

athulmenon commented 3 years ago

Hi Jon,

Sorry for late reply.

Thanks for the update. I have a small query, can we get the Transposable elements information which are removed during the prediction?

Regards, Athul

nextgenusfs commented 3 years ago

They are located in output_folder/predict_misc/bad_models.gff.