nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

Failing tests (annotate being the most important one) #992

Open HerderB opened 6 months ago

HerderB commented 6 months ago

Are you using the latest release? Using 1.8.15

Describe the bug Several tests are failing (annotate, busco, rna-seq) using both the mamba installation and installing from source. clean, mask, compare and predict appear to be working fine. Everything seems to be in order when running funannotate check.

Below are the logfiles for the annotation test. I can provide additional information regarding the other failing tests, although the annotation test is the most pressing issue for now.

What command did you issue? funannotate test -t annotate --cpus 10 --debug

Logfiles

#########################################################
Running `funannotate annotate` unit testing
CMD: funannotate annotate --genbank Genome_one.gbk -o annotate --cpus 10 --iprscan genome_one.iprscan.xml --eggnog genome_one.emapper.annotations
#########################################################
-------------------------------------------------------
[Dec 28 02:21 PM]: OS: CentOS Linux 7, 128 cores, ~ 1057 GB RAM. Python: 3.9.6
[Dec 28 02:21 PM]: Running 1.8.15
[Dec 28 02:21 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[Dec 28 02:21 PM]: Checking GenBank file for annotation
[Dec 28 02:21 PM]: Adding Functional Annotation to Genome one, NCBI accession: None
[Dec 28 02:21 PM]: Annotation consists of: 125 gene models
[Dec 28 02:21 PM]: 124 protein records loaded
[Dec 28 02:21 PM]: Running HMMer search of PFAM version 36.0
[Dec 28 02:21 PM]: 94 annotations added
[Dec 28 02:21 PM]: Running Diamond blastp search of UniProt DB version 2023_05
[Dec 28 02:22 PM]: 12 valid gene/product annotations from 14 total
[Dec 28 02:22 PM]: Existing Eggnog-mapper results found: annotate/annotate_misc/eggnog.emapper.annotations
[Dec 28 02:22 PM]: Parsing EggNog Annotations
[Dec 28 02:22 PM]: EggNog version parsed as 1.0.3
[Dec 28 02:22 PM]: 132  COG and EggNog annotations added
[Dec 28 02:22 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.92
[Dec 28 02:22 PM]: 21 gene name and product description annotations added
[Dec 28 02:22 PM]: Running Diamond blastp search of MEROPS version 12.0
[Dec 28 02:22 PM]: 0 annotations added
[Dec 28 02:22 PM]: Annotating CAZYmes using HMMer search of dbCAN version 12.0
[Dec 28 02:22 PM]: 3 annotations added
[Dec 28 02:22 PM]: Annotating proteins with BUSCO dikarya models
[Dec 28 02:22 PM]: 6 annotations added
[Dec 28 02:22 PM]: Skipping phobius predictions, try funannotate remote -m phobius
[Dec 28 02:22 PM]: Predicting secreted proteins with SignalP
[Dec 28 02:22 PM]: 9 secretome and 0 transmembane annotations added
[Dec 28 02:22 PM]: Parsing InterProScan5 XML file
[Dec 28 02:22 PM]: Found 0 duplicated annotations, adding 642 valid annotations
[Dec 28 02:22 PM]: Converting to final Genbank format, good luck!
Traceback (most recent call last):
  File "/opt/ohpc/pub/apps/funannotate/1.8.15-GCCcore-11.2.0/bin/funannotate", line 8, in <module>
    sys.exit(main())
  File "/opt/ohpc/pub/apps/funannotate/1.8.15-GCCcore-11.2.0/lib/python3.9/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/opt/ohpc/pub/apps/funannotate/1.8.15-GCCcore-11.2.0/lib/python3.9/site-packages/funannotate/annotate.py", line 1720, in main
    BadProducts = lib.getFailedProductNames(discrep, Gene2ProdFinal)
  File "/opt/ohpc/pub/apps/funannotate/1.8.15-GCCcore-11.2.0/lib/python3.9/site-packages/funannotate/library.py", line 8783, in getFailedProductNames
    if "DiscRep_SUB:SUSPECT_PRODUCT_NAMES::" in block[0]:
IndexError: list index out of range
#########################################################
ERROR: `funannotate annotate` test failed - check logfiles
#########################################################

OS/Install Information

funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.15
-------------------------------------------------------
You are running Python v 3.9.6. Now checking python packages...
biopython: 1.79
goatools: 1.3.1
matplotlib: 3.5.2
natsort: 8.4.0
numpy: 1.21.3
pandas: 1.3.4
psutil: 5.9.4
requests: 2.31.0
scikit-learn: 1.0.2
scipy: 1.7.1
seaborn: 0.11.2
All 11 python packages installed

You are running Perl v b'5.034000'. Now checking perl modules...
Carp: 1.50
Clone: 0.45
DBD::SQLite: 1.70
DBD::mysql: 4.050
DBI: 1.643
DB_File: 1.857
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.27
Getopt::Long: 2.52
Hash::Merge: 0.302
JSON: 4.03
LWP::UserAgent: 6.55
Logger::Simple: 2.0
POSIX: 1.97
Parallel::ForkManager: 2.02
Pod::Usage: 2.01
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.13
Tie::File: 1.06
URI::Escape: 5.09
YAML: 1.30
local::lib: 2.000029
threads: 2.26
threads::shared: 1.62
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/nfs/databases/funannotate/current
$PASAHOME=/opt/ohpc/pub/apps/PASA/2.5.3
$TRINITY_HOME=/opt/ohpc/pub/apps/Trinity/2.15.1-foss-2021b/trinityrnaseq-v2.15.1
$EVM_HOME=/opt/ohpc/pub/apps/EVidenceModeler/2.1.0
$AUGUSTUS_CONFIG_PATH=/opt/ohpc/pub/apps/AUGUSTUS/3.4.0-foss-2021b/config
$GENEMARK_PATH=/opt/ohpc/pub/apps/GeneMark-ES/4.72-GCCcore-11.2.0
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
PASA: 2.5.3
CodingQuarry: 2.0
Trinity: 2.15.1
augustus: 3.4.0
bamtools: bamtools 2.5.2
bedtools: bedtools v2.30.0
blat: BLAT v37x1
diamond: 2.0.13
emapper.py: 2.1.7
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: 36.3.8i
glimmerhmm: 3.0.4
gmap: 2021-12-17
gmes_petap.pl: 4.72_lic
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.2
kallisto: 0.48.0
mafft: v7.490 (2021/Oct/30)
makeblastdb: makeblastdb 2.12.0+
minimap2: 2.24-r1122
pigz: 2.3.4
proteinortho: 6.2.3
pslCDnaFilter: no way to determine
salmon: salmon 1.4.0
samtools: samtools 1.14
signalp: 4.1
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.12 (Nov 2022)
tantan: tantan 40
tbl2asn: 25.8
tblastn: tblastn 2.12.0+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
All 37 external dependencies are installed
hyphaltip commented 6 months ago

Did you run setup to download and install the external database ?

hyphaltip commented 6 months ago

But seems overall very few annotations - only 93 pfam hits ?

HerderB commented 6 months ago

Did you run setup to download and install the external database ?

Yes, databases have been prepared using funannotate setup and did not give any warnings or errors.

But seems overall very few annotations - only 93 pfam hits ?

This is the test set for annotation, I assumed a smaller protein list is being used in the test set as to not make the runtime too long just for testing the tools. How many annotations would be expected to come out of the test set?

nextgenusfs commented 6 months ago

It looks like tbl2asn is failing as the error here is from a function that is parsing the discrepancy report generated from tbl2asn. If you look at the log file you should see the tbl2asn command that was issued, it might be helpful to run that manually and see if there are any additional errors.