nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
321 stars 85 forks source link

error in funannotate annotate #672

Open liuyca1 opened 2 years ago

liuyca1 commented 2 years ago

when we use the funannotate annotate, there was an error message

(funannotate) [liuyuanchao@master01 /data/liuyuanchao/funannotate_test/Analysis/Volvariella_volvacea_V23] $funannotate annotate -i ./fun --cpus 48

[Nov 29 12:08 PM]: OS: CentOS Linux 7, 160 cores, ~ 958 GB RAM. Python: 3.9.7 [Nov 29 12:08 PM]: Running 1.8.7 [Nov 29 12:08 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Nov 29 12:08 PM]: Parsing input files [Nov 29 12:08 PM]: Existing tbl found: ./fun/predict_results/Volvariella_volvacea.tbl [Nov 29 12:08 PM]: Adding Functional Annotation to Volvariella volvacea, NCBI accession: None [Nov 29 12:08 PM]: Annotation consists of: 9,563 gene models [Nov 29 12:08 PM]: 9,407 protein records loaded [Nov 29 12:08 PM]: Running HMMer search of PFAM version 34.0 [Nov 29 12:09 PM]: 8,380 annotations added [Nov 29 12:09 PM]: Running Diamond blastp search of UniProt DB version 2021_04 [Nov 29 12:10 PM]: 413 valid gene/product annotations from 590 total [Nov 29 12:10 PM]: Running Eggnog-mapper [Nov 29 12:26 PM]: Parsing EggNog Annotations [Nov 29 12:26 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.72 [Nov 29 12:26 PM]: 413 gene name and product description annotations added [Nov 29 12:26 PM]: Running Diamond blastp search of MEROPS version 12.0 [Nov 29 12:26 PM]: 292 annotations added [Nov 29 12:26 PM]: Annotating CAZYmes using HMMer search of dbCAN version 10.0 [Nov 29 12:26 PM]: 316 annotations added [Nov 29 12:26 PM]: Annotating proteins with BUSCO dikarya models [Nov 29 12:27 PM]: 996 annotations added [Nov 29 12:27 PM]: Existing Phobius results found: ./fun/annotatemisc/phobius.results.txt [Nov 29 12:27 PM]: Predicting secreted proteins with SignalP [Nov 29 12:43 PM]: 570 secretome and 0 transmembane annotations added [Nov 29 12:43 PM]: Parsing InterProScan5 XML file [Nov 29 12:46 PM]: Now parsing antiSMASH v5 results, finding SM clusters Traceback (most recent call last): File "/opt/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 6968, in ParseAntiSmash numericalContig = '{}{}'.format(baseName, int(record.id.rsplit('_', 1)[-1])) ValueError: invalid literal for int() with base 10: 'V23s.1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/opt/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/funannotate.py", line 705, in main mod.main(arguments) File "/opt/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/annotate.py", line 1023, in main bbDomains, bbSubType, BackBone = lib.ParseAntiSmash(antismashinput, File "/opt/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 6971, in ParseAntiSmash numericalContig = '{}{}'.format(baseName, int(record.id.rsplit('.', 1)[0].rsplit('_', 1)[-1])) ValueError: invalid literal for int() with base 10: 'V23s'

$funannotate check --show-versions

Checking dependencies for 1.8.7

You are running Python v 3.9.7. Now checking python packages... biopython: 1.79 goatools: 1.1.6 matplotlib: 3.4.3 natsort: 8.0.0 numpy: 1.21.4 pandas: 1.3.4 psutil: 5.8.0 requests: 2.26.0 scikit-learn: 1.0.1 scipy: 1.7.0 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/data/liuyuanchao/funannotate_test/all_database $PASAHOME=/opt/anaconda3/envs/funannotate/opt/pasa-2.4.1 $TRINITY_HOME=/opt/anaconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/opt/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/opt/anaconda3/envs/funannotate/config/ ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.8 emapper.py: 2.1.3 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2018-07-04 gmes_petap.pl: 4.68_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.22-r1101 proteinortho: 6.0.31 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.10 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 36 external dependencies are installed

nextgenusfs commented 2 years ago

What are your contig names, V23s? That is quite atypical to have a string after the numerical portion. The script is trying to number the clusters it found from antiSMASH and naming them numerically.

liuyca1 commented 2 years ago

What are your contig names, V23s? That is quite atypical to have a string after the numerical portion. The script is trying to number the clusters it found from antiSMASH and naming them numerically.

Yes, as the restriction on the name of the contig ID during gene prediction is no more than 10 characters, so we changed the contig ID to

VVo_V23s.1, VVo_V23s.2, VVo_V23s.3, VVo_V23s.4, VVo_V23s.5, VVo_V23s.6, ... then use the following command to predict gene $funannotate predict -i Volvariella_volvacea.V23s_genomic_masked.fna --species "Volvariella volvacea" -o ./fun --busco_seed_species anidulans --busco_db dikarya --min_trainin g_models 150 -d /data/liuyuanchao/funannotate_test/all_database --cpus 48 --name VVoV23s

Are there any other restrictions on ID names when funannotate annotate?

nextgenusfs commented 2 years ago

Okay. You didn't do anything "wrong" per say just used a naming convention I apparently haven't seen before nor think about..., here is where it died https://github.com/nextgenusfs/funannotate/blob/master/funannotate/library.py#L7032-L7039. So because there is an underscore, it is assuming you have a name like scaffold_1 or contig_19 etc. So I guess I need another check there for this.

You can install the latest from GitHub and it should be fixed, just re-issue the same command. Install the latest with:

python -m pip install git+https://github.com/nextgenusfs/funannotate.git --upgrade --force --no-deps
nextgenusfs commented 2 years ago

You would generally save yourself some headaches in the future by using funannotate sort to rename your scaffolds/contigs.

liuyca1 commented 2 years ago

Thank you so much. The speed of the software upgrade is amazing. When I realized this problem, I modified the ID name of the file obtained by the funannotate predict, and then I can continue to finish funannotate annotate.

[Nov 30 03:58 PM]: OS: CentOS Linux 7, 160 cores, ~ 958 GB RAM. Python: 3.9.7 [Nov 30 03:58 PM]: Running 1.8.7 [Nov 30 03:58 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Nov 30 03:58 PM]: Found existing output directory ./fun. Warning, will re-use any intermediate files found. [Nov 30 03:58 PM]: Parsing input files [Nov 30 03:58 PM]: Existing tbl found: ./fun/predict_results/Ganoderma_leucocontextum.tbl [Nov 30 03:59 PM]: Adding Functional Annotation to Ganoderma leucocontextum, NCBI accession: None [Nov 30 03:59 PM]: Annotation consists of: 17,001 gene models [Nov 30 03:59 PM]: 16,791 protein records loaded [Nov 30 03:59 PM]: Existing Pfam-A results found: ./fun/annotate_misc/annotations.pfam.txt [Nov 30 03:59 PM]: 11,674 annotations added [Nov 30 03:59 PM]: Running Diamond blastp search of UniProt DB version 2021_04 [Nov 30 03:59 PM]: 461 valid gene/product annotations from 647 total [Nov 30 03:59 PM]: Existing Eggnog-mapper results found: ./fun/annotate_misc/eggnog.emapper.annotations [Nov 30 03:59 PM]: Parsing EggNog Annotations [Nov 30 03:59 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.72 [Nov 30 03:59 PM]: 461 gene name and product description annotations added [Nov 30 03:59 PM]: Existing MEROPS results found: ./fun/annotate_misc/annotations.merops.txt [Nov 30 03:59 PM]: 417 annotations added [Nov 30 03:59 PM]: Existing CAZYme results found: ./fun/annotate_misc/annotations.dbCAN.txt [Nov 30 03:59 PM]: 423 annotations added [Nov 30 03:59 PM]: Existing BUSCO2 results found: ./fun/annotate_misc/annotations.busco.txt [Nov 30 03:59 PM]: 1,195 annotations added [Nov 30 03:59 PM]: Existing Phobius results found: ./fun/annotate_misc/phobius.results.txt [Nov 30 03:59 PM]: Existing SignalP results found: ./fun/annotate_misc/signalp.results.txt [Nov 30 03:59 PM]: 892 secretome and 0 transmembane annotations added [Nov 30 03:59 PM]: Now parsing antiSMASH v5 results, finding SM clusters [Nov 30 03:59 PM]: Found 32 clusters, 58 biosynthetic enyzmes, and 57 smCOGs predicted by antiSMASH [Nov 30 03:59 PM]: Found 0 duplicated annotations, adding 58,625 valid annotations [Nov 30 03:59 PM]: Converting to final Genbank format, good luck! [Nov 30 04:02 PM]: Creating AGP file and corresponding contigs file [Nov 30 04:02 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4 [Nov 30 04:02 PM]: Creating tab-delimited SM cluster output [Nov 30 04:02 PM]: Writing genome annotation table. [Nov 30 04:06 PM]: Funannotate annotate has completed successfully!

Because the command of funannotate predict has a limit on the length of Contig’s ID (headers should not contain characters than the max (16))), and I am currently analyzing macro-fungi. If the Contig’s ID is coded with the Latin scientific name and strain number, it will generally exceed 16 characters, so I need to use a combination of letters and numbers to abbreviate.

Next time I will use funannotate sort to rename Contig’s ID to avoid errors.

I am very interested in the new version. After the analysis of this batch of data is completed, I will install the update. Thank you very much.

liuyca1 commented 2 years ago

Okay. You didn't do anything "wrong" per say just used a naming convention I apparently haven't seen before nor think about..., here is where it died https://github.com/nextgenusfs/funannotate/blob/master/funannotate/library.py#L7032-L7039. So because there is an underscore, it is assuming you have a name like scaffold_1 or contig_19 etc. So I guess I need another check there for this.

You can install the latest from GitHub and it should be fixed, just re-issue the same command. Install the latest with:

python -m pip install git+https://github.com/nextgenusfs/funannotate.git --upgrade --force --no-deps

hi, I tried to upgrade funannotate according to the command , but failed several times. No specific reason was given. Is there any other way to upgrade?

(funannotate) [liuyuanchao@master01 /data/liuyuanchao] $python -m pip install git+https://github.com/nextgenusfs/funannotate.git --upgrade --force --no-deps Defaulting to user installation because normal site-packages is not writeable Collecting git+https://github.com/nextgenusfs/funannotate.git Cloning https://github.com/nextgenusfs/funannotate.git to /tmp/pip-req-build-l5j_i2gv Running command git clone -q https://github.com/nextgenusfs/funannotate.git /tmp/pip-req-build-l5j_i2gv fatal: unable to access 'https://github.com/nextgenusfs/funannotate.git/': Encountered end of file WARNING: Discarding git+https://github.com/nextgenusfs/funannotate.git. Command errored out with exit status 128: git clone -q https://github.com/nextgenusfs/funannotate.git /tmp/pip-req-build-l5j_i2gv Check the logs for full command output. ERROR: Command errored out with exit status 128: git clone -q https://github.com/nextgenusfs/funannotate.git /tmp/pip-req-build-l5j_i2gv Check the logs for full command output.

(funannotate) [liuyuanchao@master01 /data/liuyuanchao] $funannotate check --show-versions

Checking dependencies for 1.8.7

You are running Python v 3.9.7. Now checking python packages... biopython: 1.79 goatools: 1.1.6 matplotlib: 3.4.3 natsort: 8.0.0 numpy: 1.21.4 pandas: 1.3.4 psutil: 5.8.0 requests: 2.26.0 scikit-learn: 1.0.1 scipy: 1.7.0 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/data/liuyuanchao/funannotate_test/all_database $PASAHOME=/opt/anaconda3/envs/funannotate/opt/pasa-2.4.1 $TRINITY_HOME=/opt/anaconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/opt/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/opt/anaconda3/envs/funannotate/config/ ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.8 emapper.py: 2.1.3 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2018-07-04 gmes_petap.pl: 4.68_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.22-r1101 proteinortho: 6.0.31 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.10 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 36 external dependencies are installed