nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

Issue parsing iprscan XML file #1002

Open Cat4Lion opened 4 months ago

Cat4Lion commented 4 months ago

Are you using the latest release? 1.8.10 - singularity install. Can try another install, but just checking that this isn't a simple version mis-match issue.

Describe the bug trouble with parsing iprscan xml file

What command did you issue?

Running funannotate [02/20/24 21:39:51]: /venv/bin/funannotate annotate --input predict_results --iprscan Fv_Wa1orig_interpro.xml --eggnog Fv_WA1orig.eggnog.annotations --antismash Fusarium_virguliforme_WA1orig.antismash.gbk --phobius Fv_WA1orig_phobious_out.txt --rename SAY83 --sbt Fv_template.sbt --species Fusarium virguliforme --isolate WA1 --cpus 8 --busco_db dikarya

Logfiles Runs through all other files, relevant CMD error from parsing XML from interproscan-5.60-92.0:

[02/20/24 21:48:57]: Parsing InterProScan5 XML file [02/20/24 21:48:57]: /venv/bin/python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/iprscan2annotations.py /local/workdir/keb45/funannotate/Fv_Wa1/annotate_misc/iprscan.xml /local/workdir/keb45/funannotate/Fv_Wa1/annotate_misc/annotations.iprscan.txt [02/20/24 21:52:51]: CMD ERROR: /venv/bin/python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/iprscan2annotations.py /local/workdir/keb45/funannotate/Fv_Wa1/annotate_misc/iprscan.xml /local/workdir/keb45/funannotate/Fv_Wa1/annotate_misc/annotations.iprscan.txt [02/20/24 21:52:51]: Error parsing XML GO terms: None is not a valid term

OS/Install Information

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000024 threads: 2.15 threads::shared: 1.56 ERROR: Bio::Perl not installed, install with cpanm Bio::Perl

Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/venv/config ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.9.1-internal kallisto: 0.46.1 mafft: v7.505 (2022/Apr/10) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: pigz 2.6 proteinortho: 6.0.16 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.12 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 31 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed

hyphaltip commented 4 months ago

would help to have a copy of the XML file causing error

Cat4Lion commented 4 months ago

It's way too big, but here are annotations for the first gene, maybe just to check it's not a file format conflict issue. Thanks.

iprscan_sample.txt

nextgenusfs commented 4 months ago

Hi @Cat4Lion, so it seems you are running a development version, ie 1.8.10 -- all of the even numbered are not releases but rather at an intermediate step. The error isn't actually in your XML file, but rather parsing the GO obo file and must be some ontology term that it was not expecting. When I look at our current code this error shouldn't be displayed, but rather default to GO terminology of go_unknown. I'm going to tag a new bug fix release shortly, but my suggestion would be to try the latest codebase and let us know if you still are having issues. Not sure how the singularity image was built, but if it was based off of the docker image, the latest tag should be current.