nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
321 stars 85 forks source link

SignalP 6.0 / Phobius parsing issue #716

Closed xvazquezc closed 2 years ago

xvazquezc commented 2 years ago

Are you using the latest release? Yes, 1.8.10 installed from master

Describe the bug Apparent conflict between SignalP 6.0g and Phobius during parsing. Possibly related to @Dikaryotic comment here

What command did you issue?

funannotate annotate -i ${BASEDIR}/predict \
--eggnog ${ENOG} \
--antismash ${ANTISMASHGBK} \
--iprscan ${IPRS} \
--signalp ${SIGNALPOUT} \
--cpus $NCPUS \
--no-progress

Logfiles

[Apr 26 03:42 PM]: OS: CentOS Linux 7, 24 cores, ~ 131 GB RAM. Python: 3.8.12
[Apr 26 03:42 PM]: Running 1.8.10
[Apr 26 03:42 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pa
ss it here '--sbt'
[Apr 26 03:42 PM]: Parsing input files
[Apr 26 03:42 PM]: Existing tbl found: /share/bioinfo/z3382651/frl/funannotate/cfod/predict/predict_results/Coniochae
ta_fodinicola_FRL.tbl
[Apr 26 03:43 PM]: Adding Functional Annotation to Coniochaeta fodinicola, NCBI accession: None
[Apr 26 03:43 PM]: Annotation consists of: 9,407 gene models
[Apr 26 03:43 PM]: 9,204 protein records loaded
[Apr 26 03:43 PM]: Running HMMer search of PFAM version 35.0
[Apr 26 03:45 PM]: 10,693 annotations added
[Apr 26 03:45 PM]: Running Diamond blastp search of UniProt DB version 2022_01
[Apr 26 03:45 PM]: 878 valid gene/product annotations from 1,172 total
[Apr 26 03:45 PM]: Existing Eggnog-mapper results found: /share/bioinfo/z3382651/frl/funannotate/cfod/predict/annotat
e_misc/eggnog.emapper.annotations
[Apr 26 03:45 PM]: Parsing EggNog Annotations
[Apr 26 03:45 PM]: EggNog version parsed as 2.1.2
[Apr 26 03:45 PM]: 18,062 COG and EggNog annotations added
[Apr 26 03:45 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.76
[Apr 26 03:45 PM]: 2,699 gene name and product description annotations added
[Apr 26 03:45 PM]: Running Diamond blastp search of MEROPS version 12.0
[Apr 26 03:45 PM]: 289 annotations added
[Apr 26 03:45 PM]: Annotating CAZYmes using HMMer search of dbCAN version 10.0
[Apr 26 03:46 PM]: 393 annotations added
[Apr 26 03:46 PM]: Annotating proteins with BUSCO dikarya models
[Apr 26 03:50 PM]: 1,280 annotations added
[Apr 26 03:50 PM]: Predicting secreted and transmembrane proteins using Phobius
[Apr 26 03:59 PM]: Existing SignalP results found: /share/bioinfo/z3382651/frl/funannotate/cfod/predict/annotate_misc/signalp.results.txt
-------------------------------------------------------
Traceback (most recent call last):
  File "/home/z3382651/miniconda3/envs/funannotate-master/bin/funannotate", line 8, in <module>
    sys.exit(main())
  File "/home/z3382651/miniconda3/envs/funannotate-master/lib/python3.8/site-packages/funannotate/funannotate.py", line 711, in main
    mod.main(arguments)
  File "/home/z3382651/miniconda3/envs/funannotate-master/lib/python3.8/site-packages/funannotate/annotate.py", line 1035, in main
    lib.parsePhobiusSignalP(
  File "/home/z3382651/miniconda3/envs/funannotate-master/lib/python3.8/site-packages/funannotate/library.py", line 5698, in parsePhobiusSignalP
    if col[9] == 'Y':  # then there is signal peptide
IndexError: list index out of range

OS/Install Information

-------------------------------------------------------
Checking dependencies for 1.8.10
-------------------------------------------------------
You are running Python v 3.8.12. Now checking python packages...
biopython: 1.77
goatools: 1.2.3
matplotlib: 3.4.3
natsort: 8.1.0
numpy: 1.22.3
pandas: 1.4.2
psutil: 5.9.0
requests: 2.27.1
scikit-learn: 1.0.2
scipy: 1.8.0
seaborn: 0.11.2
All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules...
Bio::Perl: 1.007002
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.15
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
local::lib: 2.000024
threads: 2.15
threads::shared: 1.56
All 28 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/share/bioinfo/z3382651/funannotate_db
$PASAHOME=/home/z3382651/miniconda3/envs/funannotate-master/opt/pasa-2.4.1
$TRINITY_HOME=/home/z3382651/miniconda3/envs/funannotate-master/opt/trinity-2.8.5
$EVM_HOME=/home/z3382651/miniconda3/envs/funannotate-master/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/home/z3382651/miniconda3/envs/funannotate-master/config/
    ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
-------------------------------------------------------
Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.14
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.13
kallisto: 0.46.1
mafft: v7.505 (2022/Apr/10)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: pigz 2.6
proteinortho: 6.0.34
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 31
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
    ERROR: emapper.py not installed
    ERROR: gmes_petap.pl not installed
    ERROR: signalp not installed

PS: signalp is installed, but the executable is signalp6, this is by default.

xvazquezc commented 2 years ago

The issue seemed to arise from parsePhobiusSignalP in library.py not parsing the SignalP v6 input properly and being interpreted as version <5. I submitted a PR #731 that fixes this sorta mimicking the fix done for the SignalP parsing function.