nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

Error parsing iprscan.xml during funannotate annotate, after local iprscan run #684

Open Rcperez opened 2 years ago

Rcperez commented 2 years ago

Are you using the latest release? using the latest docker image

Describe the bug Error parsing iprscan.xml while running funannotate annotate command. xml was generated by local run of InterProScan5.

What command did you issue?

First I used:

~/funannotate-docker annotate -i fun/ --cpus 8

Then I used:

~/funannotate-docker annotate -i ~/fun/ --iprscan ~/fun/annotate_misc/iprscan.xml

Below was reported for both.

Logfiles

[Dec 30 11:46 PM]: OS: Debian GNU/Linux 10, 8 cores, ~ 33 GB RAM. Python: 3.8.12 [Dec 30 11:46 PM]: Running 1.8.10 [Dec 30 11:47 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Dec 30 11:47 PM]: Found existing output directory fun. Warning, will re-use any intermediate files found. [Dec 30 11:47 PM]: Parsing input files [Dec 30 11:47 PM]: Existing tbl found: fun/predict_results/fungus.tbl [Dec 30 11:47 PM]: Adding Functional Annotation to fungus, NCBI accession: None [Dec 30 11:47 PM]: Annotation consists of: 17,581 gene models [Dec 30 11:47 PM]: 17,478 protein records loaded [Dec 30 11:47 PM]: Existing Pfam-A results found: fun/annotate_misc/annotations.pfam.txt [Dec 30 11:47 PM]: 16,481 annotations added [Dec 30 11:47 PM]: Running Diamond blastp search of UniProt DB version 2021_04 [Dec 30 11:47 PM]: 1,112 valid gene/product annotations from 1,526 total [Dec 30 11:47 PM]: Install eggnog-mapper or use webserver to improve functional annotation: https://github.com/jhcepas/eggnog-mapper [Dec 30 11:47 PM]: No Eggnog-mapper results found. [Dec 30 11:47 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.72 [Dec 30 11:47 PM]: 1,112 gene name and product description annotations added [Dec 30 11:47 PM]: Existing MEROPS results found: fun/annotate_misc/annotations.merops.txt [Dec 30 11:47 PM]: 489 annotations added [Dec 30 11:47 PM]: Existing CAZYme results found: fun/annotate_misc/annotations.dbCAN.txt [Dec 30 11:47 PM]: 373 annotations added [Dec 30 11:47 PM]: Existing BUSCO2 results found: fun/annotate_misc/annotations.busco.txt [Dec 30 11:47 PM]: 1,382 annotations added [Dec 30 11:47 PM]: Existing Phobius results found: fun/annotate_misc/phobius.results.txt [Dec 30 11:47 PM]: SignalP not installed, secretome prediction less accurate using only Phobius [Dec 30 11:47 PM]: 1,050 secretome and 2,848 transmembane annotations added [Dec 30 11:47 PM]: Parsing InterProScan5 XML file [Dec 30 11:47 PM]: CMD ERROR: /venv/bin/python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/iprscan2annotations.py fun/annotate_misc/iprscan.xml fun/annotate_misc/annotations.iprscan.txt [Dec 30 11:47 PM]: Traceback (most recent call last): File "/venv/lib/python3.8/site-packages/funannotate/auxscripts/iprscan2annotations.py", line 32, in for , elem in tree: File "/venv/lib/python3.8/xml/etree/ElementTree.py", line 1227, in iterator yield from pullparser.read_events() File "/venv/lib/python3.8/xml/etree/ElementTree.py", line 1302, in read_events raise event File "/venv/lib/python3.8/xml/etree/ElementTree.py", line 1274, in feed self._parser.feed(data) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 1

OS/Install Information

Checking dependencies for 1.8.10

You are running Python v 3.8.12. Now checking python packages... biopython: 1.77 goatools: 1.1.6 matplotlib: 3.5.1 natsort: 8.0.1 numpy: 1.21.4 pandas: 1.3.5 psutil: 5.8.0 requests: 2.26.0 scikit-learn: 1.0.1 scipy: 1.5.3 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 ERROR: Bio::Perl not installed, install with cpanm Bio::Perl ERROR: local::lib not installed, install with cpanm local::lib

Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/venv/config ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... Traceback (most recent call last): File "/venv/bin/ete3", line 6, in from ete3.tools.ete import main File "/venv/lib/python3.8/site-packages/ete3/tools/ete.py", line 55, in from . import (ete_split, ete_expand, ete_annotate, ete_ncbiquery, ete_view, File "/venv/lib/python3.8/site-packages/ete3/tools/ete_view.py", line 48, in from .. import (Tree, PhyloTree, TextFace, RectFace, faces, TreeStyle, CircleFace, AttrFace, ImportError: cannot import name 'TextFace' from 'ete3' (/venv/lib/python3.8/site-packages/ete3/init.py) PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.13 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.9.1-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.23-r1111 proteinortho: 6.0.16 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.12 snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: ete3 not installed ERROR: gmes_petap.pl not installed ERROR: pigz not installed ERROR: signalp not installed

nextgenusfs commented 2 years ago

What version of interproscan? And how did you run it? The error seems to suggest it is erring on trying to read the first line, can you run head on the interpro xml file?

Rcperez commented 2 years ago

Thanks so much for your prompt reply. I opted to try: funannotate annotate -m docker, it is still running.

version: InterProScan-5.53-87.0

iprscan.xml has one line, even though the below command returned no errors.

~/funannotate-docker iprscan -i fun/ -m local -c 6 --iprscan_path ~/my_interproscan/interproscan-5.53-87.0/interproscan.sh

iprscan.xml is located in annotate_misc.

head iprscan.xml returns:

< / protein-matches> (spaces inserted artificially to make text visible)

Rcperez commented 2 years ago

command using local Interproscan:

~/funannotate-docker iprscan -i fun/ -m local -c 6 --iprscan_path ~/my_interproscan/interproscan-5.53-87.0/interproscan.sh

output after ~6 hours using 8 core 32 GB, Debian 4.19.208-1 x86_64:

Running InterProScan5 on 17478 proteins Important: you need to manually configure your interproscan.properties file for embedded workers. Will try to launch 6 interproscan processes, adjust -c,--cpus for your system InterProScan5 search has completed successfully! Results are here: fun/annotate_misc/iprscan.xml

nextgenusfs commented 2 years ago

I'm not sure the iprscan script will work in the docker image as it is not installed in the container and won't have access to the rest of your system. You likely need to run it manually.

Rcperez commented 2 years ago

understood, thank you.

VDaric commented 2 years ago

I have the same issue.

I have installed a local copy of interproscan and have set $INTERPROSH shell variable with local interproscan.sh path.

If I understand you correctly, the local interproscan still won't work despite this setings ?