nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

error in funannotate annotate #669

Closed liuyca1 closed 2 years ago

liuyca1 commented 2 years ago

we finished gene predict by "funannotate predict" and "funannotate iprscan", but when we use "funannotate annotate", An error message appeared: $funannotate annotate -i ./fun -d /data/liuyuanchao/funannotate_test/all_database/ --cpus 48

[Nov 19 11:29 AM]: OS: CentOS Linux 7, 160 cores, ~ 958 GB RAM. Python: 3.9.7 [Nov 19 11:29 AM]: Running 1.8.7 [Nov 19 11:29 AM]: Database files not found in /data/liuyuanchao/funannotate_test/all_database/, run funannotate database and/or funannotate setup

$funannotate database

Funannotate Databases currently installed:

Database Type Version Date Num_Records Md5checksum
merops diamond 12.0 2017-10-04 5009 a6dd76907896708f3ca5335f58560356 uniprot diamond 2021_03 2021-06-02 565254 68ed1e475d13bb3d5574c53822d11cd3 dbCAN hmmer3 9.0 2020-08-04 641 04696dfba1c3bb82ff9b72cfbb3e4a65 pfam hmmer3 34.0 2021-03 19179 f83c0d00445257fd9c066ad3e9e10568 repeats diamond 1.0 2021-11-08 11950 4e8cafc3eea47ec7ba505bb1e3465d21 go text 2021-10-26 2021-10-26 47226 6757c819642e79e1406cad3ffcb6ea3d mibig diamond 1.4 2021-11-08 31023 118f2c11edde36c81bdea030a0228492 interpro xml 86.0 2021-06-03 38913 0d8c575f88f397397b9491520b38db1e busco_outgroups outgroups 1.0 2021-11-08 8 6795b1d4545850a4226829c7ae8ef058 gene2product text 1.72 2021-10-18 34111 d844fe60a5ab66e07f884da1cc08f16c

To update a database type: funannotate setup -i DBNAME -d /data/liuyuanchao/funannotate_test/all_database --force

To see install BUSCO outgroups type: funannotate database --show-outgroups

To see BUSCO tree type: funannotate database --show-buscos

$funannotate check --show-versions

Checking dependencies for 1.8.7

You are running Python v 3.9.7. Now checking python packages... biopython: 1.79 goatools: 1.1.6 matplotlib: 3.4.3 natsort: 8.0.0 numpy: 1.21.4 pandas: 1.3.4 psutil: 5.8.0 requests: 2.26.0 scikit-learn: 1.0.1 scipy: 1.7.0 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/data/liuyuanchao/funannotate_test/all_database $PASAHOME=/opt/anaconda3/envs/funannotate/opt/pasa-2.4.1 $TRINITY_HOME=/opt/anaconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/opt/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/opt/anaconda3/envs/funannotate/config/ ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.8 emapper.py: 2.1.3 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2018-07-04 gmes_petap.pl: 4.68_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.22-r1101 proteinortho: 6.0.31 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.10 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 36 external dependencies are installed

nextgenusfs commented 2 years ago

what is output of ls -l $FUNANNOTATE_DB, seems like during funannotate setup something didn't properly create a database file needed for annotate.

liuyca1 commented 2 years ago

what is output of ls -l $FUNANNOTATE_DB, seems like during funannotate setup something didn't properly create a database file needed for annotate.

 $ls -l $FUNANNOTATE_DB
total 57018900
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 14  2017 actinopterygii
-rw-rw-r--   1 liuyuanchao liuyuanchao   220677450 Nov  6 11:01 actinopterygii.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Nov 18  2016 alveolata_stramenophiles
-rw-rw-r--   1 liuyuanchao liuyuanchao    10551644 Nov  6 10:42 alveolata_stramenophiles.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Nov  1  2016 arthropoda
-rw-rw-r--   1 liuyuanchao liuyuanchao    43933198 Nov  6 10:45 arthropoda.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 ascomycota
-rw-rw-r--   1 liuyuanchao liuyuanchao    67966037 Nov  6 10:27 ascomycota.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 aves
-rw-rw-r--   1 liuyuanchao liuyuanchao   137974970 Nov  6 11:08 aves.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 basidiomycota
-rw-rw-r--   1 liuyuanchao liuyuanchao    68863784 Nov  6 10:41 basidiomycota.tar.gz
drwxrwxr-x   3 liuyuanchao liuyuanchao        4096 Jul 26 18:36 busco_db
-rw-r--r--   1 liuyuanchao liuyuanchao     2374032 Nov  8 11:59 busco_outgroups.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao        8098 Nov 10 16:46 dbCAN.changelog.txt
-rw-rw-r--   1 liuyuanchao liuyuanchao       63489 Nov 10 16:46 dbCAN-fam-HMMs.txt
-rw-rw-r--   1 liuyuanchao liuyuanchao    94317882 Nov 10 16:46 dbCAN.hmm
-rw-r--r--   1 liuyuanchao liuyuanchao    17191104 Nov 10 16:46 dbCAN.hmm.h3f
-rw-r--r--   1 liuyuanchao liuyuanchao       29869 Nov 10 16:46 dbCAN.hmm.h3i
-rw-r--r--   1 liuyuanchao liuyuanchao    39279761 Nov 10 16:46 dbCAN.hmm.h3m
-rw-r--r--   1 liuyuanchao liuyuanchao    46144080 Nov 10 16:46 dbCAN.hmm.h3p
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Jul 23 16:08 dikarya
-rw-r--r--   1 liuyuanchao liuyuanchao    66199252 Jul 23 16:08 dikarya.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 diptera
-rw-rw-r--   1 liuyuanchao liuyuanchao   145735505 Nov  6 10:55 diptera.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao 41370988544 Mar  2  2021 eggnog.db
-rw-rw-r--   1 liuyuanchao liuyuanchao    32817522 Aug  3 15:31 eggnog.db.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao  9285439161 Mar  2  2021 eggnog_proteins.dmnd
-rw-r--r--   1 liuyuanchao liuyuanchao   278003712 Nov 11  2020 eggnog.taxa.db
-rw-r--r--   1 liuyuanchao liuyuanchao     6628719 Nov 11  2020 eggnog.taxa.db.traverse.pkl
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 14  2017 embryophyta
-rw-rw-r--   1 liuyuanchao liuyuanchao    64919077 Nov  6 11:25 embryophyta.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 endopterygota
-rw-rw-r--   1 liuyuanchao liuyuanchao   118029754 Nov  6 10:48 endopterygota.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 euarchontoglires
-rw-rw-r--   1 liuyuanchao liuyuanchao   315719027 Nov  6 11:18 euarchontoglires.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Nov  2  2016 eukaryota
-rw-rw-r--   1 liuyuanchao liuyuanchao    13244593 Nov  6 10:42 eukaryota.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 eurotiomycetes
-rw-rw-r--   1 liuyuanchao liuyuanchao   210241744 Nov  6 10:34 eurotiomycetes.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao         473 Nov  9 08:14 funannotate-annotate.136510.log
-rw-rw-r--   1 liuyuanchao liuyuanchao        1900 Aug  4 20:23 funannotate-annotate.36075.log
-rw-rw-r--   1 liuyuanchao liuyuanchao         420 Nov  9 14:58 funannotate-annotate.65436.log
-rw-rw-r--   1 liuyuanchao liuyuanchao         473 Nov  9 11:54 funannotate-annotate.66655.log
-rw-rw-r--   1 liuyuanchao liuyuanchao         420 Nov  9 15:01 funannotate-annotate.68022.log
-rw-r--r--   1 liuyuanchao liuyuanchao        1191 Nov 19 09:01 funannotate-db-info.txt
-rw-r--r--   1 liuyuanchao liuyuanchao    11017103 Mar  2  2016 funannotate.repeat.proteins.fa
-rw-rw-r--   1 liuyuanchao liuyuanchao     6325661 Nov  8 11:53 funannotate.repeat.proteins.fa.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao    11017079 Nov  8 11:53 funannotate.repeats.reformat.fa
-rw-rw-r--   1 liuyuanchao liuyuanchao         598 Nov  9 15:06 funannotate-setup.log
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 fungi
-rw-rw-r--   1 liuyuanchao liuyuanchao    12673693 Nov  6 10:25 fungi.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao    33814749 Nov  9 10:11 go.obo
drwxrwxr-x   3 liuyuanchao liuyuanchao          30 Aug  4 17:33 hmmer
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 hymenoptera
-rw-rw-r--   1 liuyuanchao liuyuanchao   233690214 Nov  6 10:52 hymenoptera.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 insecta
-rw-rw-r--   1 liuyuanchao liuyuanchao    67256544 Nov  6 10:46 insecta.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao     2102658 Nov 10 17:47 interpro.tsv
-rw-rw-r--   1 liuyuanchao liuyuanchao   193864157 Nov 10 17:47 interpro.xml
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 laurasiatheria
-rw-rw-r--   1 liuyuanchao liuyuanchao   286089494 Nov  6 11:23 laurasiatheria.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 mammalia
-rw-rw-r--   1 liuyuanchao liuyuanchao   262985539 Nov  6 11:13 mammalia.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao     1445045 Nov 19 09:01 merops.dmnd
-rw-rw-r--   1 liuyuanchao liuyuanchao     1383413 Nov 19 09:01 merops.formatted.fa
-rw-rw-r--   1 liuyuanchao liuyuanchao     1957603 Nov 19 09:01 merops_scan.lib
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 metazoa
-rw-rw-r--   1 liuyuanchao liuyuanchao    39476850 Nov  6 10:43 metazoa.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao    21740858 Nov  8 11:54 mibig.dmnd
-rw-rw-r--   1 liuyuanchao liuyuanchao    21244378 Nov  8 11:54 mibig.fa
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 microsporidia
-rw-rw-r--   1 liuyuanchao liuyuanchao    21181462 Nov  6 10:26 microsporidia.tar.gz
drwxrwxr-x   2 liuyuanchao liuyuanchao        4096 Aug  4 17:32 mmseqs
-rw-rw-r--   1 liuyuanchao liuyuanchao     1363545 Nov  6 17:38 ncbi_cleaned_gene_products.txt
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 nematoda
-rw-rw-r--   1 liuyuanchao liuyuanchao    45483712 Nov  6 10:44 nematoda.tar.gz
drwxr-xr-x   2 liuyuanchao liuyuanchao        4096 Dec  5  2016 outgroups
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 pezizomycotina
-rw-rw-r--   1 liuyuanchao liuyuanchao   164636032 Nov  6 10:30 pezizomycotina.tar.gz
drwxrwxr-x   2 liuyuanchao liuyuanchao        4096 Aug  3 16:30 pfam
-rw-rw-r--   1 liuyuanchao liuyuanchao     1114025 Nov 10 17:46 Pfam-A.clans.tsv
-rw-rw-r--   1 liuyuanchao liuyuanchao  1538737879 Nov 10 17:46 Pfam-A.hmm
-rw-r--r--   1 liuyuanchao liuyuanchao   351982037 Nov 10 17:46 Pfam-A.hmm.h3f
-rw-r--r--   1 liuyuanchao liuyuanchao     1323456 Nov 10 17:46 Pfam-A.hmm.h3i
-rw-r--r--   1 liuyuanchao liuyuanchao   636978544 Nov 10 17:46 Pfam-A.hmm.h3m
-rw-r--r--   1 liuyuanchao liuyuanchao   749322833 Nov 10 17:46 Pfam-A.hmm.h3p
-rw-rw-r--   1 liuyuanchao liuyuanchao         111 Nov 10 17:46 Pfam.version
-rw-r--r--   1 liuyuanchao liuyuanchao      581185 Aug  7  2018 protein.evidence.fasta
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Nov 18  2016 protists
-rw-rw-r--   1 liuyuanchao liuyuanchao     9459518 Nov  6 10:42 protists.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao    11090231 Nov  8 11:53 repeats.dmnd
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 saccharomycetales
-rw-rw-r--   1 liuyuanchao liuyuanchao    72402218 Nov  6 10:40 saccharomycetales.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 saccharomycetes
-rw-rw-r--   1 liuyuanchao liuyuanchao    91857381 Nov  6 10:39 saccharomycetes.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 sordariomycetes
-rw-rw-r--   1 liuyuanchao liuyuanchao   192011468 Nov  6 10:37 sordariomycetes.tar.gz
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 tetrapoda
-rw-rw-r--   1 liuyuanchao liuyuanchao   209110833 Nov  6 11:05 tetrapoda.tar.gz
drwxr-xr-x 110 liuyuanchao liuyuanchao        4096 Jul 23 16:08 trained_species
-rw-rw-r--   1 liuyuanchao liuyuanchao   280362561 Nov  9 09:21 uniprot_sprot.fasta
drwxr-xr-x   5 liuyuanchao liuyuanchao        4096 Feb 13  2017 vertebrata
-rw-rw-r--   1 liuyuanchao liuyuanchao   137371990 Nov  6 10:57 vertebrata.tar.gz
-rw-rw-r--   1 liuyuanchao liuyuanchao     5255536 Aug  3 20:55 wget-log

We also probably know that it is a database problem, but we don’t know how to solve it

nextgenusfs commented 2 years ago

Looks like its the uniprot database that is missing or rather the diamond database is missing, should be able to fix with:

funannotate setup -i uniprot --force --wget

This is the check in the code: https://github.com/nextgenusfs/funannotate/blob/master/funannotate/annotate.py#L453-L454

liuyca1 commented 2 years ago

Thank you so much, amazing, it's working fine now. $funannotate setup -i uniprot --force --wget

[Nov 19 11:51 AM]: OS: CentOS Linux 7, 160 cores, ~ 958 GB RAM. Python: 3.9.7 [Nov 19 11:51 AM]: Running 1.8.7 [Nov 19 11:51 AM]: Database location: /data/liuyuanchao/funannotate_test/all_database [Nov 19 11:51 AM]: Retrieving download links from GitHub Repo [Nov 19 11:56 AM]: Parsing Augustus pre-trained species and porting to funannotate [Nov 19 11:56 AM]: Downloading UniProtKB/SwissProt database --2021-11-19 11:56:14-- ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz => ‘/data/liuyuanchao/funannotate_test/all_database/uniprot_sprot.fasta.gz’ Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74 Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/databases/uniprot/current_release/knowledgebase/complete ... done. ==> SIZE uniprot_sprot.fasta.gz ... 90527596 ==> PASV ... done. ==> RETR uniprot_sprot.fasta.gz ... done. Length: 90527596 (86M) (unauthoritative)

uniprot_sprot.fasta.gz 100%[====================================================================================================================>] 86.33M 749KB/s in 6m 22s

2021-11-19 12:02:39 (231 KB/s) - ‘/data/liuyuanchao/funannotate_test/all_database/uniprot_sprot.fasta.gz’ saved [90527596]

--2021-11-19 12:02:41-- ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt => ‘/data/liuyuanchao/funannotate_test/all_database/uniprot.release-date.txt’ Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74 Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/databases/uniprot/current_release/knowledgebase/complete ... done. ==> SIZE reldate.txt ... 151 ==> PASV ... done. ==> RETR reldate.txt ... done. Length: 151 (unauthoritative)

reldate.txt 100%[====================================================================================================================>] 151 --.-KB/s in 0.003s

2021-11-19 12:02:44 (53.3 KB/s) - ‘/data/liuyuanchao/funannotate_test/all_database/uniprot.release-date.txt’ saved [151]

[Nov 19 12:02 PM]: Building diamond database [Nov 19 12:02 PM]: UniProtKB Database: version=2021_04 date=2021-11-17 records=565,928

$funannotate annotate -i ./fun -d /data/liuyuanchao/funannotate_test/all_database/ --cpus 48

[Nov 19 12:04 PM]: OS: CentOS Linux 7, 160 cores, ~ 958 GB RAM. Python: 3.9.7 [Nov 19 12:04 PM]: Running 1.8.7 [Nov 19 12:04 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Nov 19 12:04 PM]: Parsing input files [Nov 19 12:04 PM]: Existing tbl found: ./fun/predict_results/Laccaria_bicolor.tbl [Nov 19 12:05 PM]: Adding Functional Annotation to Laccaria bicolor, NCBI accession: None [Nov 19 12:05 PM]: Annotation consists of: 14,640 gene models [Nov 19 12:05 PM]: 14,295 protein records loaded [Nov 19 12:05 PM]: Running HMMer search of PFAM version 34.0 [Nov 19 12:06 PM]: 10,314 annotations added [Nov 19 12:06 PM]: Running Diamond blastp search of UniProt DB version 2021_04 [Nov 19 12:07 PM]: 453 valid gene/product annotations from 624 total [Nov 19 12:07 PM]: Running Eggnog-mapper

continue.....