nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 85 forks source link

Funannotate annotate (eggNog) ValueError: too many values to unpack (expected 2) #676

Open drabe004 opened 2 years ago

drabe004 commented 2 years ago

version funannotate/1.8.5

running annotate with the command funannotate annotate -i /panfs/roc/groups/14/mcgaughs/drabe004/Funnannote_Full/MBRI_FINAL --force --busco_db actinopterygii --iprscan /panfs/roc/groups/14/mcgaughs/drabe004/Interprotscan/MBRIoutput.iprscan --cpus 8

Error and logfile: **I am running this on output from predict --- I know previous issues were raised with long locus tags but mine are just fun_00001 etc...

[Dec 10 11:30 AM]: Running Eggnog-mapper [Dec 10 12:35 PM]: Parsing EggNog Annotations

Traceback (most recent call last): File "/panfs/roc/msisoft/funannotate/1.8.5/bin/funannotate", line 10, in sys.exit(main()) File "/panfs/roc/msisoft/funannotate/1.8.5/lib/python3.7/site-packages/funannotate/funannotate.py", line 705, in main mod.main(arguments) File "/panfs/roc/msisoft/funannotate/1.8.5/lib/python3.7/site-packages/funannotate/annotate.py", line 749, in main EggNog = parseEggNoggMapper(eggnog_result, eggnog_out, GeneProducts) File "/panfs/roc/msisoft/funannotate/1.8.5/lib/python3.7/site-packages/funannotate/annotate.py", line 241, in parseEggNoggMapper NOG, DB = cols[OGi].split('@') ValueError: too many values to unpack (expected 2)

OS/Install Information this is being run on the UMN MSI cluster**

You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.41 DBD::SQLite: 1.58 DBD::mysql: 4.046 DBI: 1.634 DB_File: 1.842 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.22 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 2.97001 LWP::UserAgent: 6.34 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 1.19 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.11 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.26 threads: 2.21 threads::shared: 1.58 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/panfs/roc/msisoft/funannotate/common_databases-10.20.20 $PASAHOME=/panfs/roc/msisoft/pasa/2.3.3 $TRINITYHOME=/panfs/roc/msisoft/funannotate/1.8.5/opt/trinity-2.8.5 $EVM_HOME=/panfs/roc/msisoft/evidencemodeler/1.1.1 $AUGUSTUS_CONFIG_PATH=/panfs/roc/msisoft/funannotate/1.8.5/config $GENEMARK_PATH=/panfs/roc/msisoft/genemark/4.32 All 6 environmental variables are set

Checking external dependencies... ERROR: gmap found but error running gmap PASA: 2.3.3 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.7 emapper.py: 2.1.1-1 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmes_petap.pl: 4.61_lic hisat2: 2.1.0 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.475 (2020/Nov/23) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.17-r941 proteinortho: 6.0.29 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.10 snap: 2006-07-28 stringtie: 2.1.5 tRNAscan-SE: 2.0.7 (Oct 2020) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: gmap not installed ERROR: signalp not installed

nextgenusfs commented 2 years ago

Eggnog mapper has changed their format several times, so this is an error in this older version of funannotate in parsing an unexpected eggnog format. I think it is all working with the most recent changes. So in short you would need to use eggnog v1.0.3 or update funannotate to latest version to deal with the change in eggnog formats.

drabe004 commented 2 years ago

Hi! We ran it with the current version and we got this error.

[Dec 13 12:14 PM]: OS: CentOS Linux 7, 128 cores, ~ 528 GB RAM. Python: 3.7.11 [Dec 13 12:14 PM]: Running 1.8.9 [Dec 13 12:14 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Dec 13 12:14 PM]: Parsing input files [Dec 13 12:14 PM]: Existing tbl found: /panfs/roc/groups/14/mcgaughs/drabe004/Funnannote_Full/MBRI_FINAL/predict_results/Mastacembelus_brichardi.tbl [Dec 13 12:17 PM]: Adding Functional Annotation to Mastacembelus brichardi, NCBI accession: None [Dec 13 12:17 PM]: Annotation consists of: 31,337 gene models [Dec 13 12:17 PM]: 27,871 protein records loaded [Dec 13 12:17 PM]: Running HMMer search of PFAM version 33.1 [Dec 13 12:28 PM]: 33,323 annotations added [Dec 13 12:28 PM]: Running Diamond blastp search of UniProt DB version 2020_05 [Dec 13 12:29 PM]: 7,143 valid gene/product annotations from 9,131 total [Dec 13 12:29 PM]: Running Eggnog-mapper [Dec 13 01:10 PM]: Parsing EggNog Annotations [Dec 13 01:10 PM]: EggNog version parsed as 2.1.1-1

Traceback (most recent call last): File "/panfs/roc/msisoft/funannotate/1.8.9/bin/funannotate", line 10, in sys.exit(main()) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/funannotate.py", line 705, in main mod.main(arguments) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/annotate.py", line 792, in main EggNog = parseEggNoggMapper(eggnog_result, eggnog_out, GeneProducts) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/annotate.py", line 268, in parseEggNoggMapper OGs = cols[DBi].split(',') TypeError: list indices must be integers or slices, not NoneType

nextgenusfs commented 2 years ago

Can you try the version in master where I think the changes for more eggnog parsing is located (hasn't made its way into a release yet), you can install with pip from that environment,

python -m pip install git+https://github.com/nextgenusfs/funannotate.git --upgrade --force --no-deps
drabe004 commented 2 years ago

Hi! We did the pip install as noted above, re-ran it and again got an eggnog issue.

See here: [Dec 19 11:50 AM]: OS: CentOS Linux 7, 128 cores, ~ 264 GB RAM. Python: 3.7.11 [Dec 19 11:50 AM]: Running 1.8.10 [Dec 19 11:50 AM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Dec 19 11:50 AM]: Parsing input files [Dec 19 11:50 AM]: Existing tbl found: /panfs/roc/groups/14/mcgaughs/drabe004/Funnannote_Full/RHLA_done/output_fun_RHLA/predict_results/Rhamdia_laluchensis.tbl [Dec 19 11:53 AM]: Adding Functional Annotation to Rhamdia laluchensis, NCBI accession: None [Dec 19 11:53 AM]: Annotation consists of: 65,383 gene models [Dec 19 11:53 AM]: 62,635 protein records loaded [Dec 19 11:53 AM]: Running HMMer search of PFAM version 33.1 [Dec 19 12:01 PM]: 33,593 annotations added [Dec 19 12:01 PM]: Running Diamond blastp search of UniProt DB version 2020_05 [Dec 19 12:02 PM]: 13,300 valid gene/product annotations from 17,164 total [Dec 19 12:02 PM]: Running Eggnog-mapper [Dec 19 12:30 PM]: Parsing EggNog Annotations [Dec 19 12:30 PM]: EggNog version parsed as 2.1.1-1

Traceback (most recent call last): File "/panfs/roc/msisoft/funannotate/1.8.9/bin/funannotate", line 8, in sys.exit(main()) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/funannotate.py", line 710, in main mod.main(arguments) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/annotate.py", line 826, in main EggNog = parseEggNoggMapper(eggnog_result, eggnog_out, GeneProducts) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/annotate.py", line 270, in parseEggNoggMapper OGs = cols[DBi].split(',') TypeError: list indices must be integers or slices, not NoneType

drabe004 commented 2 years ago

Does it matter that the genome annotation was done with the previous version (1.8.5)?

drabe004 commented 2 years ago

by annotation I mean funannotate predict

nextgenusfs commented 2 years ago

No it is choking on the eggnog annotation. I will try to find some time to figure out why. I don't see the version that it detected in the emapper releases so it likely is a slight variation in format in that version. I'm unlikely to fix it since it is an older version of eggnog, you could upgrade eggnog mapper and if it fails with latest version I will look at it more detail.

drabe004 commented 2 years ago

Hey Jon, Sure, its detected V 2.1.1 [12/19/21 12:30:53]: Parsing EggNog Annotations [12/19/21 12:30:53]: EggNog version parsed as 2.1.1-1 [12/19/21 12:30:53]: EggNog annotation detected as emapper v2.1.1-1 and DB prefix ENOG50

Wondering if a way to get around this is to local install and run eggnog 2.1.6 and use the --eggnog flag will this prevent the pipeline from running eggnog if it is installed locally (but the wrong version)? Or will it run eggnog anyways? I'm working with a cluster where I don't have admin control here so trying to figure out a way to speed things up so I don't have to wait on module updates.

nextgenusfs commented 2 years ago

If the results file is there it won't rerun eggnog. That's the problem, apparently v2.1.1 has some different output format that funannotate cannot parse. It is pulling the version from the results header and trying its best to figure out what the columns are, etc. I didn't have time to test every version of eggnog mapper.....

drabe004 commented 2 years ago

Hi Jon, I downloaded a local and current version of eggnog, ran eggnog and used the --eggnog flag to run annotate. This time I got an error wih SignalP: [Dec 22 05:31 PM]: Predicting secreted proteins with SignalP

Traceback (most recent call last): File "/panfs/roc/msisoft/funannotate/1.8.9/bin/funannotate", line 8, in sys.exit(main()) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/funannotate.py", line 710, in main mod.main(arguments) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/annotate.py", line 1028, in main outputdir, 'annotate_misc'), signalp_out) File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/library.py", line 5622, in signalP if '.' in version: TypeError: argument of type 'bool' is not iterable

DDrabeck commented 2 years ago

Hi--- just checking in here as I haven't heard back about this signal IP error issue.

nextgenusfs commented 2 years ago

Looks like it can't figure out which signalP version is installed. What is the output of signalP in that environment.

DDrabeck commented 2 years ago

I don't see an output for signalP at all. I re-ran it with a shell that specified module load signalp/5.0 The only log file it wrote was for phobius.

This is the last thing on the globa log file INFO BUSCO analysis done. Total running time: 488.72091722488403 seconds INFO Results written in /panfs/roc/groups/14/mcgaughs/drabe004/Funnannote_Full/RHLA_done/output_fun_RHLA/annotate_misc/run_busco/

[01/10/22 15:07:44]: 2,271 annotations added [01/10/22 15:07:44]: Predicting secreted and transmembrane proteins using Phobius [01/10/22 15:22:01]: Predicting secreted proteins with SignalP

nextgenusfs commented 2 years ago

So the executable 'signalp' is not in your PATH?

nextgenusfs commented 2 years ago

There isn't any output for signalp because I think it's perhaps not installed properly or it is a version that is outputting in a way the script is not parsing. So if you can just post the output to what happens when you type signalp that should be helpful.

DDrabeck commented 2 years ago

Gah! this made me realize there was a private group for the signalP module that I was not added to and was preventing me from using it/loading it properly! Sorry about that and thank you!

DDrabeck commented 2 years ago

Hi Jon,

We're getting another error, and I'm guessing this is yet another version issue but I wanted to check and see in case it was not.

[Jan 12 09:54 AM]: Parsing InterProScan5 XML file [Jan 12 09:54 AM]: CMD ERROR: /panfs/roc/msisoft/funannotate/1.8.9/bin/python /panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/aux_scripts/iprscan2annotations.py /panfs/roc/groups/14/mcgaughs/drabe004/Funnannote_Full/RHLA_done/output_fun_RHLA/annotate_misc/iprscan.xml /panfs/roc/groups/14/mcgaughs/drabe004/Funnannote_Full/RHLA_done/output_fun_RHLA/annotate_misc/annotations.iprscan.txt [Jan 12 09:54 AM]: Traceback (most recent call last): File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/site-packages/funannotate/auxscripts/iprscan2annotations.py", line 32, in for , elem in tree: File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/xml/etree/ElementTree.py", line 1222, in iterator yield from pullparser.read_events() File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/xml/etree/ElementTree.py", line 1297, in read_events raise event File "/panfs/roc/msisoft/funannotate/1.8.9/lib/python3.7/xml/etree/ElementTree.py", line 1269, in feed self._parser.feed(data) xml.etree.ElementTree.ParseError: syntax error: line 1, column 0

nextgenusfs commented 2 years ago

Can you run head on the annotate_misc/iprscan.xml file? Seems like it might be empty.

DDrabeck commented 2 years ago

So I don't have an iprscan.xml file in my annotate_misc folder, I think because I ran it separately:

funannotate annotate -i /panfs/roc/groups/14/mcgaughs/drabe004/Funnannote_Full/MBRI_FINAL --force --busco_db actinopterygii --iprscan /panfs/roc/groups/14/mcgaughs/drabe004/Interprotscan/MBRIoutput.iprscan --cpus 8

ipscan shell: module load interproscan/testing_5.23-62.0

interproscan.sh -appl pfam -dp -f TSV -goterms -iprlookup -pa -t p -i /panfs/roc/groups/14/mcgaughs/drabe004/Funnannote_Full/output_fun_MBRI_FINAL/predict_results/Mastacembelus_brichardi.proteins.fa -o MBRIoutput.iprscan

My outfiles from interprotscan are .iprscan and head for those look like this:

FUN_009377-T1 bf4bab9713e71d665951a85b773aff16 703 Pfam PF16661 Metallo-beta-lactamase superfamily domain 36 213 5.4E-13 T 06-12-2021 IPR001279 Metallo-beta-lactamase
FUN_009377-T1 bf4bab9713e71d665951a85b773aff16 703 Pfam PF11718 Pre-mRNA 3'-end-processing endonuclease polyadenylation factor C-term 499 701 1.9E-50 T 06-12-2021 IPR021718 Pre-mRNA 3'-end-processing endonuclease polyadenylation factor C-term Reactome: R-HSA-109688|Reactome: R-HSA-159231|Reactome: R-HSA-72163|Reactome: R-HSA-72187|Reactome: R-HSA-77595 FUN_009377-T1 bf4bab9713e71d665951a85b773aff16 703 Pfam PF10996 Beta-Casp domain 266 387 9.1E-34 T 06-12-2021 IPR022712 Beta-Casp domain
FUN_009377-T1 bf4bab9713e71d665951a85b773aff16 703 Pfam PF07521 Zn-dependent metallo-hydrolase RNA specificity domain 403 468 5.9E-21 T 06-12-2021 IPR011108 Zn-dependent metallo-hydrolase, RNA specificity domain
FUN_018214-T1 eab4fde36b5078147aa26e158b142401 3279 Pfam PF00063 Myosin head (motor domain) 956 1539 3.5E-170 T 06-12-2021 IPR001609 Myosin head, motor domain GO:0003774|GO:0005524|GO:0016459
FUN_018214-T1 eab4fde36b5078147aa26e158b142401 3279 Pfam PF00373 FERM central domain 3053 3168 2.1E-7 T 06-12-2021 IPR019748 FERM central domain
FUN_018214-T1 eab4fde36b5078147aa26e158b142401 3279 Pfam PF07653 Variant SH3 domain 2612 2691 4.1E-10 T 06-12-2021 IPR011511 Variant SH3 domain
FUN_018214-T1 eab4fde36b5078147aa26e158b142401 3279 Pfam PF00784 MyTH4 domain 2843 2951 5.9E-22 T 06-12-2021 IPR000857 MyTH4 domain GO:0005856
FUN_018214-T1 eab4fde36b5078147aa26e158b142401 3279 Pfam PF00784 MyTH4 domain 1754 1862 6.7E-23 T 06-12-2021 IPR000857 MyTH4 domain GO:0005856

nextgenusfs commented 2 years ago

You need to output in XML format instead of TSV.... Unfortunately you can't convert TSV to XML (you can convert XML to any of the other formats after you have those results), so you'll need to run it again but use -f XML.

nextgenusfs commented 2 years ago

Oh and you probably want to add -goterms to the command as that is where funannotate gets the gene ontology annotations.

DDrabeck commented 2 years ago

Ok great! Thank you!

IanDMedeiros commented 2 years ago

Gah! this made me realize there was a private group for the signalP module that I was not added to and was preventing me from using it/loading it properly! Sorry about that and thank you!

Hi @DDrabeck, could you elaborate on how you solved this? I am seeing the same error with SignalP in funannotate but I'm afraid I don't understand what you mean above.

DDrabeck commented 2 years ago

Hi! You have to ask msi to add you to the signalP group in order to use the signalP module within this pipeline.

On Wed, Aug 10, 2022 at 2:50 PM IanDMedeiros @.***> wrote:

Gah! this made me realize there was a private group for the signalP module that I was not added to and was preventing me from using it/loading it properly! Sorry about that and thank you!

Hi @DDrabeck https://github.com/DDrabeck, could you elaborate on how you solved this? I am seeing the same error with SignalP in funannotate but I'm afraid I don't understand what you mean above.

— Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/676#issuecomment-1211195803, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTRQPB3NMWTDRPPOQ6ZUMTVYQBYNANCNFSM5JZ3IHNA . You are receiving this because you were mentioned.Message ID: @.***>

-- Danielle H Drabeck PhD

Postdoctoral fellowDepartment of Ecology, Evolution, and BehaviorUniversity of Minnesota

@. @.

She/Her/Hers