nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
314 stars 83 forks source link

eggnog v2 output is not being parsed correctly #566

Closed fmobegi closed 3 years ago

fmobegi commented 3 years ago

Am getting the following error when running annotate on my isolates. The same problem occurs even when providing pre-processed results using --antismash and --eggnog flags.


-------------------------------------------------------
[05:15 PM]: OS: Ubuntu 20.04, 12 cores, ~ 33 GB RAM. Python: 3.8.5
[05:15 PM]: Running 1.8.4
[05:15 PM]: Found existing output directory funannotate_output/isolate1. Warning, will re-use any intermediate files found.
[05:15 PM]: Parsing input files
[05:15 PM]: Existing tbl found: funannotate_output/isolate1/predict_results/isolate1.tbl
[05:15 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[05:15 PM]: Annotation consists of: 10,616 gene models
[05:15 PM]: 10,364 protein records loaded
[05:15 PM]: Existing Pfam-A results found: funannotate_output/isolate1/annotate_misc/annotations.pfam.txt
[05:15 PM]: 11,652 annotations added
[05:15 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[05:15 PM]: 725 valid gene/product annotations from 1,061 total
[05:15 PM]: Existing Eggnog-mapper results found: funannotate_output/isolate1/annotate_misc/eggnog.emapper.annotations
[05:15 PM]: Parsing EggNog Annotations
[05:15 PM]: 0 COG and EggNog annotations added
[05:15 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.65
[05:15 PM]: 725 gene name and product description annotations added
[05:15 PM]: Existing MEROPS results found: funannotate_output/isolate1/annotate_misc/annotations.merops.txt
[05:15 PM]: 361 annotations added
[05:15 PM]: Existing CAZYme results found: funannotate_output/isolate1/annotate_misc/annotations.dbCAN.txt
[05:15 PM]: 511 annotations added
[05:15 PM]: Existing BUSCO2 results found: funannotate_output/isolate1/annotate_misc/annotations.busco.txt
[05:15 PM]: 1,279 annotations added
[05:15 PM]: Skipping phobius predictions, try funannotate remote -m phobius
[05:15 PM]: Existing SignalP results found: funannotate_output/isolate1/annotate_misc/signalp.results.txt
[05:15 PM]: 1,060 secretome and 0 transmembane annotations added
[05:15 PM]: Parsing InterProScan5 XML file
[05:15 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[05:15 PM]: Found 0 clusters, 0 biosynthetic enyzmes, and 0 smCOGs predicted by antiSMASH
[05:15 PM]: Found 0 duplicated annotations, adding 44,471 valid annotations
[05:15 PM]: Converting to final Genbank format, good luck!
[05:16 PM]: Creating AGP file and corresponding contigs file
[05:16 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[05:16 PM]: CMD ERROR: diamond blastp --sensitive --query funannotate_output/isolate1/annotate_misc/antismash/smcluster.proteins.fasta --threads 12 --out funannotate_output/isolate1/annotate_misc/antismash/smcluster.MIBiG.blast.txt --db /home/fredrick/funannotate_db/mibig.dmnd --max-hsps 1 --evalue 0.001 --max-target-seqs 1 --outfmt 6
b'diamond v0.9.26.127 | by Benjamin Buchfink <buchfink@gmail.com>\nLicensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>\nCheck http://github.com/bbuchfink/diamond for updates.\n\n#CPU threads: 12\nScoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)\nTemporary directory: funannotate_output/isolate1/annotate_misc/antismash\nOpening the database...  [0.000575s]\n#Target sequences to report alignments for: 1\nOpening the input file...  [5.7e-05s]\nError: Error detecting input file format. First line seems to be blank.\n'```
fmobegi commented 3 years ago

Just noticed that annotate.py script uses v1 eggnog by defauls.. changed line 222 to IDi, DBi, OGi, Genei, COGi, Desci = getEggNogHeadersv2(input)

nextgenusfs commented 3 years ago

Yes I never finished the v2 parser because the format was changing and they still haven’t tagged a stable v2 release as far as I know.

fmobegi commented 3 years ago

I see, it causes problems down the line

[Mar 09 12:02 PM]: OS: Ubuntu 20.10, 12 cores, ~ 33 GB RAM. Python: 3.8.6
[Mar 09 12:02 PM]: Running 1.8.4
[Mar 09 12:02 PM]: Found existing output directory funannotate_output/15CUR002. Warning, will re-use any intermediate files found.
[Mar 09 12:02 PM]: Parsing input files
[Mar 09 12:02 PM]: Existing tbl found: funannotate_output/15CUR002/predict_results/Ascochyta_rabiei_15CUR002.tbl
[Mar 09 12:03 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[Mar 09 12:03 PM]: Annotation consists of: 9,878 gene models
[Mar 09 12:03 PM]: 9,747 protein records loaded
[Mar 09 12:03 PM]: Existing Pfam-A results found: funannotate_output/15CUR002/annotate_misc/annotations.pfam.txt
[Mar 09 12:03 PM]: 11,115 annotations added
[Mar 09 12:03 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[Mar 09 12:03 PM]: 678 valid gene/product annotations from 984 total
[Mar 09 12:03 PM]: Existing Eggnog-mapper results found: funannotate_output/15CUR002/annotate_misc/eggnog.emapper.annotations
[Mar 09 12:03 PM]: Parsing EggNog Annotations
[Mar 09 12:03 PM]: 16,902 COG and EggNog annotations added
[Mar 09 12:03 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.66
[Mar 09 12:03 PM]: 678 gene name and product description annotations added
[Mar 09 12:03 PM]: Existing MEROPS results found: funannotate_output/15CUR002/annotate_misc/annotations.merops.txt
[Mar 09 12:03 PM]: 331 annotations added
[Mar 09 12:03 PM]: Existing CAZYme results found: funannotate_output/15CUR002/annotate_misc/annotations.dbCAN.txt
[Mar 09 12:03 PM]: 476 annotations added
[Mar 09 12:03 PM]: Existing BUSCO2 results found: funannotate_output/15CUR002/annotate_misc/annotations.busco.txt
[Mar 09 12:03 PM]: 1,243 annotations added
[Mar 09 12:03 PM]: Existing Phobius results found: funannotate_output/15CUR002/annotate_misc/phobius.results.txt
[Mar 09 12:03 PM]: Existing SignalP results found: funannotate_output/15CUR002/annotate_misc/signalp.results.txt
[Mar 09 12:03 PM]: 1,007 secretome and 2,186 transmembane annotations added
[Mar 09 12:03 PM]: Parsing InterProScan5 XML file
[Mar 09 12:03 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[Mar 09 12:03 PM]: Found 28 clusters, 68 biosynthetic enyzmes, and 84 smCOGs predicted by antiSMASH
[Mar 09 12:03 PM]: Found 2,165 duplicated annotations, adding 61,726 valid annotations
[Mar 09 12:03 PM]: Converting to final Genbank format, good luck!
[Mar 09 12:04 PM]: Creating AGP file and corresponding contigs file
[Mar 09 12:04 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Mar 09 12:05 PM]: Creating tab-delimited SM cluster output
[Mar 09 12:05 PM]: Writing genome annotation table.
Traceback (most recent call last):
  File "/home/fredrick/.local/bin/funannotate", line 8, in <module>
    sys.exit(main())
  File "/home/fredrick/.local/lib/python3.8/site-packages/funannotate/funannotate.py", line 705, in main
    mod.main(arguments)
  File "/home/fredrick/.local/lib/python3.8/site-packages/funannotate/annotate.py", line 1495, in main
    lib.annotationtable(final_gbk, FUNDB, NoteHeaders, INTERPRO,
  File "/home/fredrick/.local/lib/python3.8/site-packages/funannotate/library.py", line 7543, in annotationtable
    desc = x + ':'+ resources.COGS.get(x)
TypeError: can only concatenate str (not "NoneType") to str
fmobegi commented 3 years ago

Quick fix.. desc = x + ':'+ str(resources.COGS.get(x)) for the line 7543. Works for now and the results look reasonable.

nextgenusfs commented 3 years ago

Looks like eggnog devs finally tagged a v2.x release -- I'll get it installed locally and take a look at output files and then should be able to update the parser to work with either v1 or v2 eggnog (hopefully).

fmobegi commented 3 years ago

I believe these are the two files generated with running emapper-2.0.8-2 in funannotate.

eggnog.emapper.annotations.seed_orthologs.txt eggnog.emapper.annotations.txt

nextgenusfs commented 3 years ago

Hi @fmobegi thanks for the reference files. I think I have it working I'm going to reopen issue as the modifications you made above might have made it so pipeline didn't crash but didn't fix the underlying differences between the v1 and v2 formats. I also added EC_number parsing from eggnog v2 data.

If you would be able to test that would be great, can install from master with pip. If all is working, I'll tag a new release in the next few days. Thanks.

fmobegi commented 3 years ago

Thanks @nextgenusfs . Everything works okay now.

Here is the progress from a few of the isolates I was analysing

-------------------------------------------------------
[Mar 12 01:03 PM]: OS: Ubuntu 20.10, 12 cores, ~ 33 GB RAM. Python: 3.8.6
[Mar 12 01:03 PM]: Running 1.8.4
[Mar 12 01:03 PM]: Found existing output directory funannotate_output/TR6417. Warning, will re-use any intermediate files found.
[Mar 12 01:03 PM]: Parsing input files
[Mar 12 01:03 PM]: Existing tbl found: funannotate_output/TR6417/predict_results/Ascochyta_rabiei_TR6417.tbl
[Mar 12 01:03 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[Mar 12 01:03 PM]: Annotation consists of: 9,766 gene models
[Mar 12 01:03 PM]: 9,633 protein records loaded
[Mar 12 01:03 PM]: Existing Pfam-A results found: funannotate_output/TR6417/annotate_misc/annotations.pfam.txt
[Mar 12 01:03 PM]: 10,843 annotations added
[Mar 12 01:03 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[Mar 12 01:03 PM]: 683 valid gene/product annotations from 984 total
[Mar 12 01:03 PM]: Existing Eggnog-mapper results found: funannotate_output/TR6417/annotate_misc/eggnog.emapper.annotations
[Mar 12 01:03 PM]: Parsing EggNog Annotations
[Mar 12 01:03 PM]: 18,165 COG and EggNog annotations added
[Mar 12 01:03 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.66
[Mar 12 01:03 PM]: 2,451 gene name and product description annotations added
[Mar 12 01:03 PM]: Existing MEROPS results found: funannotate_output/TR6417/annotate_misc/annotations.merops.txt
[Mar 12 01:03 PM]: 331 annotations added
[Mar 12 01:03 PM]: Existing CAZYme results found: funannotate_output/TR6417/annotate_misc/annotations.dbCAN.txt
[Mar 12 01:03 PM]: 465 annotations added
[Mar 12 01:03 PM]: Existing BUSCO2 results found: funannotate_output/TR6417/annotate_misc/annotations.busco.txt
[Mar 12 01:03 PM]: 1,231 annotations added
[Mar 12 01:03 PM]: Existing Phobius results found: funannotate_output/TR6417/annotate_misc/phobius.results.txt
[Mar 12 01:03 PM]: Existing SignalP results found: funannotate_output/TR6417/annotate_misc/signalp.results.txt
[Mar 12 01:03 PM]: 973 secretome and 2,129 transmembane annotations added
[Mar 12 01:03 PM]: Parsing InterProScan5 XML file
[Mar 12 01:03 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[Mar 12 01:03 PM]: Found 25 clusters, 60 biosynthetic enyzmes, and 73 smCOGs predicted by antiSMASH
[Mar 12 01:03 PM]: Found 0 duplicated annotations, adding 65,558 valid annotations
[Mar 12 01:03 PM]: Converting to final Genbank format, good luck!
[Mar 12 01:04 PM]: Creating AGP file and corresponding contigs file
[Mar 12 01:04 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Mar 12 01:04 PM]: Creating tab-delimited SM cluster output
[Mar 12 01:04 PM]: Writing genome annotation table.
[Mar 12 01:05 PM]: Funannotate annotate has completed successfully!

        We need YOUR help to improve gene names/product descriptions:
           0 gene/products names MUST be fixed, see funannotate_output/TR6417/annotate_results/Gene2Products.must-fix.txt
           2 gene/product names need to be curated, see funannotate_output/TR6417/annotate_results/Gene2Products.need-curating.txt
           100 gene/product names passed but are not in Database, see funannotate_output/TR6417/annotate_results/Gene2Products.new-names-passed.txt

        Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

-------------------------------------------------------
-------------------------------------------------------
[Mar 12 01:05 PM]: OS: Ubuntu 20.10, 12 cores, ~ 33 GB RAM. Python: 3.8.6
[Mar 12 01:05 PM]: Running 1.8.4
[Mar 12 01:05 PM]: Found existing output directory funannotate_output/TR9544. Warning, will re-use any intermediate files found.
[Mar 12 01:05 PM]: Parsing input files
[Mar 12 01:05 PM]: Existing tbl found: funannotate_output/TR9544/predict_results/Ascochyta_rabiei_TR9544.tbl
[Mar 12 01:05 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[Mar 12 01:05 PM]: Annotation consists of: 10,408 gene models
[Mar 12 01:05 PM]: 10,262 protein records loaded
[Mar 12 01:05 PM]: Existing Pfam-A results found: funannotate_output/TR9544/annotate_misc/annotations.pfam.txt
[Mar 12 01:05 PM]: 11,616 annotations added
[Mar 12 01:05 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[Mar 12 01:05 PM]: 718 valid gene/product annotations from 1,047 total
[Mar 12 01:05 PM]: Existing Eggnog-mapper results found: funannotate_output/TR9544/annotate_misc/eggnog.emapper.annotations
[Mar 12 01:05 PM]: Parsing EggNog Annotations
[Mar 12 01:05 PM]: 19,382 COG and EggNog annotations added
[Mar 12 01:05 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.66
[Mar 12 01:05 PM]: 2,589 gene name and product description annotations added
[Mar 12 01:05 PM]: Existing MEROPS results found: funannotate_output/TR9544/annotate_misc/annotations.merops.txt
[Mar 12 01:05 PM]: 344 annotations added
[Mar 12 01:05 PM]: Existing CAZYme results found: funannotate_output/TR9544/annotate_misc/annotations.dbCAN.txt
[Mar 12 01:05 PM]: 496 annotations added
[Mar 12 01:05 PM]: Existing BUSCO2 results found: funannotate_output/TR9544/annotate_misc/annotations.busco.txt
[Mar 12 01:05 PM]: 1,265 annotations added
[Mar 12 01:05 PM]: Existing Phobius results found: funannotate_output/TR9544/annotate_misc/phobius.results.txt
[Mar 12 01:05 PM]: Existing SignalP results found: funannotate_output/TR9544/annotate_misc/signalp.results.txt
[Mar 12 01:05 PM]: 1,065 secretome and 2,300 transmembane annotations added
[Mar 12 01:05 PM]: Parsing InterProScan5 XML file
[Mar 12 01:05 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[Mar 12 01:05 PM]: Found 30 clusters, 73 biosynthetic enyzmes, and 83 smCOGs predicted by antiSMASH
[Mar 12 01:05 PM]: Found 0 duplicated annotations, adding 69,929 valid annotations
[Mar 12 01:05 PM]: Converting to final Genbank format, good luck!
[Mar 12 01:06 PM]: Creating AGP file and corresponding contigs file
[Mar 12 01:06 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Mar 12 01:06 PM]: Creating tab-delimited SM cluster output
[Mar 12 01:06 PM]: Writing genome annotation table.
[Mar 12 01:06 PM]: Funannotate annotate has completed successfully!

        We need YOUR help to improve gene names/product descriptions:
           0 gene/products names MUST be fixed, see funannotate_output/TR9544/annotate_results/Gene2Products.must-fix.txt
           3 gene/product names need to be curated, see funannotate_output/TR9544/annotate_results/Gene2Products.need-curating.txt
           97 gene/product names passed but are not in Database, see funannotate_output/TR9544/annotate_results/Gene2Products.new-names-passed.txt

        Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

-------------------------------------------------------
-------------------------------------------------------
[Mar 12 01:06 PM]: OS: Ubuntu 20.10, 12 cores, ~ 33 GB RAM. Python: 3.8.6
[Mar 12 01:06 PM]: Running 1.8.4
[Mar 12 01:06 PM]: Found existing output directory funannotate_output/TR9571. Warning, will re-use any intermediate files found.
[Mar 12 01:06 PM]: Parsing input files
[Mar 12 01:06 PM]: Existing tbl found: funannotate_output/TR9571/predict_results/Ascochyta_rabiei_TR9571.tbl
[Mar 12 01:07 PM]: Adding Functional Annotation to Ascochyta rabiei, NCBI accession: None
[Mar 12 01:07 PM]: Annotation consists of: 11,226 gene models
[Mar 12 01:07 PM]: 10,946 protein records loaded
[Mar 12 01:07 PM]: Existing Pfam-A results found: funannotate_output/TR9571/annotate_misc/annotations.pfam.txt
[Mar 12 01:07 PM]: 12,652 annotations added
[Mar 12 01:07 PM]: Running Diamond blastp search of UniProt DB version 2021_01
[Mar 12 01:07 PM]: 774 valid gene/product annotations from 1,265 total
[Mar 12 01:07 PM]: Existing Eggnog-mapper results found: funannotate_output/TR9571/annotate_misc/eggnog.emapper.annotations
[Mar 12 01:07 PM]: Parsing EggNog Annotations
[Mar 12 01:07 PM]: 21,724 COG and EggNog annotations added
[Mar 12 01:07 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.66
[Mar 12 01:07 PM]: 2,778 gene name and product description annotations added
[Mar 12 01:07 PM]: Existing MEROPS results found: funannotate_output/TR9571/annotate_misc/annotations.merops.txt
[Mar 12 01:07 PM]: 386 annotations added
[Mar 12 01:07 PM]: Existing CAZYme results found: funannotate_output/TR9571/annotate_misc/annotations.dbCAN.txt
[Mar 12 01:07 PM]: 516 annotations added
[Mar 12 01:07 PM]: Existing BUSCO2 results found: funannotate_output/TR9571/annotate_misc/annotations.busco.txt
[Mar 12 01:07 PM]: 1,336 annotations added
[Mar 12 01:07 PM]: Existing Phobius results found: funannotate_output/TR9571/annotate_misc/phobius.results.txt
[Mar 12 01:07 PM]: Existing SignalP results found: funannotate_output/TR9571/annotate_misc/signalp.results.txt
[Mar 12 01:07 PM]: 1,087 secretome and 2,377 transmembane annotations added
[Mar 12 01:07 PM]: Parsing InterProScan5 XML file
[Mar 12 01:07 PM]: Now parsing antiSMASH v6 results, finding SM clusters
[Mar 12 01:07 PM]: Found 27 clusters, 67 biosynthetic enyzmes, and 83 smCOGs predicted by antiSMASH
[Mar 12 01:07 PM]: Found 11,678 duplicated annotations, adding 76,672 valid annotations
[Mar 12 01:07 PM]: Converting to final Genbank format, good luck!
[Mar 12 01:08 PM]: Creating AGP file and corresponding contigs file
[Mar 12 01:08 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Mar 12 01:08 PM]: Creating tab-delimited SM cluster output
[Mar 12 01:08 PM]: Writing genome annotation table.
[Mar 12 01:09 PM]: Funannotate annotate has completed successfully!

        We need YOUR help to improve gene names/product descriptions:
           0 gene/products names MUST be fixed, see funannotate_output/TR9571/annotate_results/Gene2Products.must-fix.txt
           3 gene/product names need to be curated, see funannotate_output/TR9571/annotate_results/Gene2Products.need-curating.txt
           103 gene/product names passed but are not in Database, see funannotate_output/TR9571/annotate_results/Gene2Products.new-names-passed.txt

        Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

With this update funannotate compare also runs well without any hiccups.

xvazquezc commented 3 years ago

Hi there, I just run funannotate annotate (v1.8.7) with EggNOG mapper results generated externally (emapper.py v2.1.2) and funannotate fails to parse every single annotation. Apparently some changes have been made to the output files of emapper in the last release (https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.2#v212)

spock commented 3 years ago

I believe I'm seeing the same issue as @xvazquezc : eggnog.emapper.annotations has 6307 lines, but annotations.eggnog.txt has zero.
Using funannotate v1.8.7, with built-in emapper annotation.
I had a look at your previous fixing commit (https://github.com/nextgenusfs/funannotate/commit/bde15b399870cce096a15a98b699b7712e4a8726) hoping for a quick fix, but looks like this will need more than 5 minutes I had now 🙂

nextgenusfs commented 3 years ago

Can one of you send me the first like 10 lines of your emapper.annotations file so I can see the headers, names, etc.

spock commented 3 years ago
## Fri May 14 17:34:25 2021
## emapper-e6ac7f2
## funannotate/bin/emapper.py -m diamond -i genome.proteins.fasta -o eggnog --cpu 8
##
#query  seed_ortholog   evalue  score   eggNOG_OGs  max_annot_lvl   COG_category    Description Preferred_name  GOs EC  KEGG_ko KEGG_Pathway    KEGG_Module KEGG_Reaction   KEGG_rclass BRITE   KEGG_TC CAZy    BiGG_Reaction   PFAMs
FUN_000001-T1   64363.EME43602  3.5e-244    685.0   KOG0254@1|root,KOG0254@2759|Eukaryota,39TDK@33154|Opisthokonta,3NWTD@4751|Fungi,3QM8J@4890|Ascomycota,1ZZI9@147541|Dothideomycetes,3MH72@451867|Dothideomycetidae   4751|Fungi  U   Major Facilitator Superfamily   -   GO:0003674,GO:0005215,GO:0005575,GO:0005623,GO:0005886,GO:0006810,GO:0006811,GO:0006812,GO:0008150,GO:0008324,GO:0008519,GO:0015075,GO:0015696,GO:0016020,GO:0016021,GO:0022857,GO:0031224,GO:0034220,GO:0044425,GO:0044464,GO:0051179,GO:0051234,GO:0055085,GO:0071705,GO:0071944,GO:0072488,GO:0098655    -   -   -   -   -   -   -   -   -   -   MFS_1,Pkinase
FUN_000002-T1   64363.EME43601  3.48e-279   812.0   COG1020@1|root,KOG1178@2759|Eukaryota,39REN@33154|Opisthokonta,3NTWI@4751|Fungi,3QJNI@4890|Ascomycota,1ZZPI@147541|Dothideomycetes,3MGA9@451867|Dothideomycetidae   4751|Fungi  I   AMP-binding enzyme  -   -   -   -   -   -   -   -   -   -   -   -   AMP-binding,DIT1_PvcA,NAD_binding_4,PP-binding,Transferase
FUN_000003-T1   101852.XP_008088089.1   8.04e-125   388.0   COG0580@1|root,KOG0224@2759|Eukaryota,392UK@33154|Opisthokonta,3NZK7@4751|Fungi,3RJQE@4890|Ascomycota,210YQ@147548|Leotiomycetes    4751|Fungi  G   Major intrinsic protein -   GO:0003674,GO:0005215,GO:0005372,GO:0005575,GO:0005623,GO:0006810,GO:0006833,GO:0008150,GO:0008643,GO:0015144,GO:0015166,GO:0015168,GO:0015250,GO:0015267,GO:0015318,GO:0015791,GO:0015793,GO:0015850,GO:0016020,GO:0016021,GO:0022803,GO:0022838,GO:0022857,GO:0031224,GO:0034219,GO:0042044,GO:0044425,GO:0044464,GO:0051179,GO:0051234,GO:0055085,GO:0071702,GO:0071944,GO:1901618   -   ko:K03441   -   -   -   -   ko00000,ko02000 1.A.8   -   -   MIP,Mis12
FUN_000004-T1   101852.XP_008081284.1   9.97e-67    236.0   2AXX5@1|root,2S01X@2759|Eukaryota,393E9@33154|Opisthokonta,3NZWW@4751|Fungi,3QRKY@4890|Ascomycota,20Z21@147548|Leotiomycetes    4751|Fungi  S   DNA replication regulator SLD3  -   GO:0000228,GO:0000278,GO:0000785,GO:0000790,GO:0005575,GO:0005622,GO:0005623,GO:0005634,GO:0005654,GO:0005694,GO:0006139,GO:0006259,GO:0006260,GO:0006261,GO:0006270,GO:0006725,GO:0006807,GO:0007049,GO:0008150,GO:0008152,GO:0009058,GO:0009059,GO:0009987,GO:0022402,GO:0031261,GO:0031974,GO:0031981,GO:0032991,GO:0032993,GO:0033260,GO:0034641,GO:0034645,GO:0043170,GO:0043226,GO:0043227,GO:0043228,GO:0043229,GO:0043231,GO:0043232,GO:0043233,GO:0044237,GO:0044238,GO:0044249,GO:0044260,GO:0044422,GO:0044424,GO:0044427,GO:0044428,GO:0044446,GO:0044451,GO:0044454,GO:0044464,GO:0044786,GO:0046483,GO:0070013,GO:0071704,GO:0090304,GO:1901360,GO:1901576,GO:1902292,GO:1902315,GO:1902969,GO:1902975,GO:1903047 -   ko:K10731   -   -   -   -   ko00000,ko03032 -   -   -   APH,SLD3
FUN_000006-T1   140110.NechaP101753 2.75e-169   482.0   KOG1339@1|root,KOG1339@2759|Eukaryota,39ZWM@33154|Opisthokonta,3NXD8@4751|Fungi,3QMYU@4890|Ascomycota,213QU@147550|Sordariomycetes,3TH3H@5125|Hypocreales,1FXHE@110618|Nectriaceae  4751|Fungi  O   Belongs to the peptidase A1 family  -   -   3.4.23.1,3.4.23.34  ko:K01382,ko:K06002 ko04142,ko04974,map04142,map04974   -   -   -   ko00000,ko00001,ko01000,ko01002 -   -   -   Asp
FUN_000007-T1   698440.XP_007293743.1   1.47e-24    110.0   2ETTM@1|root,2SW2Y@2759|Eukaryota,3A3JK@33154|Opisthokonta,3P8ZI@4751|Fungi,3QXHI@4890|Ascomycota,210KG@147548|Leotiomycetes    4751|Fungi  -   -   -   -   -   -   -   -   -   -   -   -   -   -   -
FUN_000008-T1   101852.XP_008081286.1   3.68e-97    290.0   KOG1534@1|root,KOG1534@2759|Eukaryota,38BEG@33154|Opisthokonta,3NW4B@4751|Fungi,3QM4U@4890|Ascomycota,20WXW@147548|Leotiomycetes    4751|Fungi  K   Conserved hypothetical ATP binding protein  -   GO:0000070,GO:0000278,GO:0000280,GO:0000819,GO:0003674,GO:0003824,GO:0003924,GO:0005048,GO:0005488,GO:0006606,GO:0006810,GO:0006886,GO:0006913,GO:0006996,GO:0007049,GO:0007059,GO:0007062,GO:0007064,GO:0008104,GO:0008150,GO:0009987,GO:0015031,GO:0015833,GO:0016043,GO:0016462,GO:0016787,GO:0016817,GO:0016818,GO:0017038,GO:0017111,GO:0022402,GO:0033036,GO:0033218,GO:0033365,GO:0034504,GO:0034613,GO:0042277,GO:0042886,GO:0045184,GO:0046907,GO:0048285,GO:0051169,GO:0051170,GO:0051179,GO:0051234,GO:0051276,GO:0051641,GO:0051649,GO:0070727,GO:0071702,GO:0071705,GO:0071840,GO:0072594,GO:0098813,GO:0140014,GO:1903047 -   ko:K06883   -   -   -   -   ko00000 -   -   -   ATP_bind_1,FAA_hydrolase,SRPRB
FUN_000009-T1   655981.L8G9P6   7.09e-257   707.0   COG0192@1|root,KOG1506@2759|Eukaryota,38DWH@33154|Opisthokonta,3NUD5@4751|Fungi,3QM99@4890|Ascomycota,20WIM@147548|Leotiomycetes    4751|Fungi  H   Catalyzes the formation of S-adenosylmethionine from methionine and ATP SAM2    GO:0000096,GO:0000166,GO:0003674,GO:0003824,GO:0004478,GO:0005488,GO:0005524,GO:0005575,GO:0005622,GO:0005623,GO:0005737,GO:0005829,GO:0006082,GO:0006520,GO:0006555,GO:0006556,GO:0006732,GO:0006790,GO:0006807,GO:0008144,GO:0008150,GO:0008152,GO:0009058,GO:0009066,GO:0009108,GO:0009987,GO:0010494,GO:0016740,GO:0016765,GO:0017076,GO:0017144,GO:0019752,GO:0030554,GO:0032553,GO:0032555,GO:0032559,GO:0032991,GO:0035639,GO:0035770,GO:0036094,GO:0036464,GO:0043167,GO:0043168,GO:0043226,GO:0043228,GO:0043229,GO:0043232,GO:0043436,GO:0044237,GO:0044238,GO:0044249,GO:0044272,GO:0044281,GO:0044424,GO:0044444,GO:0044464,GO:0046500,GO:0051186,GO:0051188,GO:0071704,GO:0097159,GO:0097367,GO:1901265,GO:1901363,GO:1901564,GO:1901576,GO:1901605,GO:1990904 2.5.1.6 ko:K00789   ko00270,ko01100,ko01110,ko01230,map00270,map01100,map01110,map01230 M00034,M00035,M00368,M00609 R00177,R04771   RC00021,RC01211 ko00000,ko00001,ko00002,ko01000 -   -   iMM904.YLR180W,iND750.YLR180W   S-AdoMet_synt_C,S-AdoMet_synt_M,S-AdoMet_synt_N
nextgenusfs commented 3 years ago

So is emapper-e6ac7f2 their latest in master? Perhaps that's actually the issue as I have the parser setup to look for a version number to parse.

https://github.com/nextgenusfs/funannotate/blob/master/funannotate/annotate.py#L280

And what does it say in the log file about what version it detected?

nextgenusfs commented 3 years ago

Okay, well this max_annot_lvl is definitely a new header. Really sucks this keeps changing..... not sure I have the patience to constantly update this......

spock commented 3 years ago

e6ac7f2 looks like a commit hash, but I don't see it on their master branch.

within the conda env (installed with mamba as recommended), pip list | grep eggnog yields eggnog-mapper 2.1.2.

emapper.py --version (within the env) yields

$ emapper.py --version
emapper-e6ac7f2 / Expected eggNOG DB version: 5.0.2 / Installed eggNOG DB version: 5.0.2 / Local diamond version: diamond version 2.0.4 / Local MMseqs2 version: 113e3212c137d026e297c7540e1fcd039f6812b1

Here's a relevant log fragment:

[05/14/21 19:52:20]: Parsing EggNog Annotations
[05/14/21 19:52:20]: EggNog annotation detected as emapper ve6ac7f2 and DB prefix ENOG50
[05/14/21 19:52:20]: EggNog Parse ERROR: FUN_000001-T1  64363.EME43602  3.5e-244    685.0   KOG0254@1|root,KOG0254@2759|Eukaryota,39TDK@33154|Opisthokonta,3NWTD@4751|Fungi,3QM8J@4890|Ascomycota,1ZZI9@147541|Dothideomycetes,3MH72@451867|Dothideomycetidae   4751|Fungi  U   Major Facilitator Superfamily   -   GO:0003674,GO:0005215,GO:0005575,GO:0005623,GO:0005886,GO:0006810,GO:0006811,GO:0006812,GO:0008150,GO:0008324,GO:0008519,GO:0015075,GO:0015696,GO:0016020,GO:0016021,GO:0022857,GO:0031224,GO:0034220,GO:0044425,GO:0044464,GO:0051179,GO:0051234,GO:0055085,GO:0071705,GO:0071944,GO:0072488,GO:0098655    -   -   -   -   -   -   -   -   -   -   MFS_1,Pkinase

(and so on, it probably reports an error for each line)

spock commented 3 years ago

oh, I was too slow with my logs, sorry 🙂

nextgenusfs commented 3 years ago

Okay, well that's not great for several reasons, but it actually got at least the v2 parser... problem is that hash is going to always evaluate as greater than... even if it was a hash from a version that is not there. But the problem here is the altered headers again. So I'll try to fix it for this one... but I'll open an issue -- I do not have the time to update this every few weeks.

>>> vers = 've6ac7f2'
>>> vers < ('2.0.0')
False
>>> vers > ('2.0.0')
True
spock commented 3 years ago

eggnog-mapper was installed as a funannotate dependency with mamba/conda; as a quick issue work-around, I have tried installing eggnog-mapper 2.1.1 into this environment (instead of 2.1.2), but there is no such version on bioconda anymore - I guess they replaced the older one.

nextgenusfs commented 3 years ago

Perhaps it's on pip?

spock commented 3 years ago

Good thinking - it is!

spock commented 3 years ago

Giving this a try, should work 🙂

$ emapper.py --version
emapper-2.1.1 / Expected eggNOG DB version: 5.0.2 / Installed eggNOG DB version: 5.0.2 / Local diamond version: diamond version 2.0.4 / Local MMseqs2 version: 113e3212c137d026e297c7540e1fcd039f6812b1
spock commented 3 years ago

Yes, this workaround works.

Assuming funannotate is installed into the same-named conda environment:

conda activate funannotate
pip install eggnog-mapper==2.1.1
xvazquezc commented 3 years ago

In case it helps, in my case, I installed eggnog-mapper with funannotate in the same conda environment and both with conda. The emapper.py version shows as 2.1.2 and the version is shown correct in the header of the .annotations file, which should avoid the presence of the hash in the version number @spock has (?):

$ emapper.py --version
emapper-2.1.2

Just header and tail of the .annotations file (+one result):

## Thu May  6 11:24:23 2021
## emapper-2.1.2
## /home/z3382651/miniconda3/envs/funannotate/bin/emapper.py --data_dir /share/bioinfo/z3382651/emapper_db/ --cpu 24 --scratch_dir /scratch/pbs.14720.clive.ramaciotti.unsw.edu.au/emapper --output emapper --dbmem -i /share/bioinfo/z3382651/ferrari/annotation/predict/predict_results/Penicillium_winsconsinense.proteins.fa
##
#query  seed_ortholog   evalue  score   eggNOG_OGs  max_annot_lvl   COG_category    Description Preferred_name  GOs EC  KEGG_ko KEGG_Pathway    KEGG_Module KEGG_Reaction   KEGG_rclass BRITE   KEGG_TC CAZy    BiGG_Reaction   PFAMs
PWIN_000002-T1  36630.CADNFIAP00003069  2.25e-238       662.0   COG1063@1|root,KOG0024@2759|Eukaryota,39UET@33154|Opisthokonta,3NUTE@4751|Fungi,3QQHE@4890|Ascomycota,20B2U@147545|Eurotiomycetes,3S2X3@5042|Eurotiales 4751|Fungi      Q       Alcohol dehydrogenase GroES-like domain -       GO:0006950,GO:0008150,GO:0009636,GO:0009987,GO:0033554,GO:0042221,GO:0050896,GO:0051409,GO:0051410,GO:0051716,GO:0070458,GO:0070887,GO:0071500,GO:0097237,GO:0098754,GO:1901698,GO:1990748      -       -       -       -       -       -       -       -       -       -       ADH_N,ADH_N_assoc,ADH_zinc_N,Sacchrp_dh_NADP
.....
## 9382 queries scanned
## Total time (seconds): 200.2429313659668
## Rate: 46.85 q/s

This is the exact version and builds for both the emapper and funannotate

$ mamba list |grep "eggnog\|funannotate"
# packages in environment at /home/z3382651/miniconda3/envs/funannotate:
eggnog-mapper             2.1.2              pyhdfd78af_0    bioconda
funannotate               1.8.7              pyh5e36f6f_0    bioconda
nextgenusfs commented 3 years ago

Would somebody be able to test this latest commit for me on actual data? It should work for emapper v2.1.2 and greater. I don't have installed locally so would be great if somebody else can confirm this detects the right version and outputs the proper data. Thanks.

fmobegi commented 3 years ago

Am currently using emapper-2.1.4-2-6 with funannotate 1.8.7 and I don't have any problems.

(funannotate_env) fredrick@cicer:~$ funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.7
-------------------------------------------------------
You are running Python v 3.9.5. Now checking python packages...
biopython: 1.79
goatools: 1.1.6
matplotlib: 3.4.2
natsort: 7.1.1
numpy: 1.20.3
pandas: 1.2.4
psutil: 5.8.0
requests: 2.25.1
scikit-learn: 0.24.2
scipy: 1.6.3
seaborn: 0.11.1
All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules...
Bio::Perl: 1.007002
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/ppgdata/fredrick/funannotate_db
$PASAHOME=/opt/miniconda3/envs/funannotate_env/opt/pasa-2.4.1
$TRINITY_HOME=/opt/miniconda3/envs/funannotate_env/opt/trinity-2.8.5
$EVM_HOME=/opt/miniconda3/envs/funannotate_env/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/opt/miniconda3/envs/funannotate_env/config/
$GENEMARK_PATH=/opt/gmes_linux_64
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.8
emapper.py: 2.1.4-2-6-g05f27b0
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.8-internal
kallisto: 0.46.1
mafft: v7.480 (2021/May/21)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.20-r1061
proteinortho: 6.0.31
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.10
signalp: 5.0b
snap: 2006-07-28
stringtie: 2.1.7
tRNAscan-SE: 2.0.7 (Oct 2020)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
nextgenusfs commented 3 years ago

Hmm, well v1.8.7 may not be erring but I don't think it would be parsing the output properly, what sort of results are in annotate_misc/annotations.eggnog.txt when you run it?

fmobegi commented 3 years ago

I've updates the script and everything works well.

(funannotate_env) fredrick@cicer:/ppgdata/fredrick/assembly_data/ascochyta/2021-06-01_Ascochyta_lentis_reannotation/A_lentis_Kewell$ funannotate annotate -i funannotate_out --cpus 10 --sbt template.sbt --iprscan Ascochyta_lentis_Kewell.interproscan.xml --antismash funannotate_out/fungi-984e0060-ebed-4ff9-8d73-97c7948355c2/Ascochyta_lentis_Kewell.gbk --phobius phobius.txt --signalp Ascochyta_lentis_Kewell.signalp5
-------------------------------------------------------
[Aug 02 09:36 AM]: OS: Ubuntu 20.04, 12 cores, ~ 33 GB RAM. Python: 3.8.10
[Aug 02 09:36 AM]: Running 1.8.8
[Aug 02 09:36 AM]: Found existing output directory funannotate_out. Warning, will re-use any intermediate files found.
[Aug 02 09:36 AM]: Parsing input files
[Aug 02 09:36 AM]: Existing tbl found: funannotate_out/update_results/Ascochyta_lentis_Kewell.tbl
[Aug 02 09:36 AM]: Adding Functional Annotation to Ascochyta lentis, NCBI accession: None
[Aug 02 09:36 AM]: Annotation consists of: 11,380 gene models
[Aug 02 09:36 AM]: 11,444 protein records loaded
[Aug 02 09:36 AM]: Existing Pfam-A results found: funannotate_out/annotate_misc/annotations.pfam.txt
[Aug 02 09:36 AM]: 12,891 annotations added
[Aug 02 09:36 AM]: Running Diamond blastp search of UniProt DB version 2021_02
[Aug 02 09:36 AM]: 756 valid gene/product annotations from 1,120 total
[Aug 02 09:36 AM]: Existing Eggnog-mapper results found: funannotate_out/annotate_misc/eggnog.emapper.annotations
[Aug 02 09:36 AM]: Parsing EggNog Annotations
[Aug 02 09:36 AM]: EggNog version parsed as 2.1.4-2-6-g05f27b0
[Aug 02 09:36 AM]: 21,649 COG and EggNog annotations added
[Aug 02 09:36 AM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.69
[Aug 02 09:36 AM]: 2,784 gene name and product description annotations added
[Aug 02 09:36 AM]: Existing MEROPS results found: funannotate_out/annotate_misc/annotations.merops.txt
[Aug 02 09:36 AM]: 384 annotations added
[Aug 02 09:36 AM]: Existing CAZYme results found: funannotate_out/annotate_misc/annotations.dbCAN.txt
[Aug 02 09:36 AM]: 555 annotations added
[Aug 02 09:36 AM]: Existing BUSCO2 results found: funannotate_out/annotate_misc/annotations.busco.txt
[Aug 02 09:36 AM]: 1,304 annotations added
[Aug 02 09:36 AM]: Existing Phobius results found: funannotate_out/annotate_misc/phobius.results.txt
[Aug 02 09:36 AM]: Existing SignalP results found: funannotate_out/annotate_misc/signalp.results.txt
[Aug 02 09:36 AM]: 1,225 secretome and 2,564 transmembane annotations added
[Aug 02 09:36 AM]: Parsing InterProScan5 XML file
[Aug 02 09:36 AM]: Now parsing antiSMASH v5 results, finding SM clusters
[Aug 02 09:36 AM]: Found 41 clusters, 116 biosynthetic enyzmes, and 137 smCOGs predicted by antiSMASH
[Aug 02 09:36 AM]: Found 0 duplicated annotations, adding 77,331 valid annotations
[Aug 02 09:36 AM]: Converting to final Genbank format, good luck!
[Aug 02 09:37 AM]: Creating AGP file and corresponding contigs file
[Aug 02 09:37 AM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Aug 02 09:38 AM]: Creating tab-delimited SM cluster output
[Aug 02 09:38 AM]: Writing genome annotation table.
[Aug 02 09:38 AM]: Funannotate annotate has completed successfully!

        We need YOUR help to improve gene names/product descriptions:
           0 gene/products names MUST be fixed, see funannotate_out/annotate_results/Gene2Products.must-fix.txt
           1 gene/product names need to be curated, see funannotate_out/annotate_results/Gene2Products.need-curating.txt
           108 gene/product names passed but are not in Database, see funannotate_out/annotate_results/Gene2Products.new-names-passed.txt

        Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

-------------------------------------------------------
(funannotate_env) fredrick@cicer:/ppgdata/fredrick/assembly_data/ascochyta/2021-06-01_Ascochyta_lentis_reannotation/A_lentis_Kewell$ head funannotate
funannotate-annotate.2789178.log  funannotate-annotate.2813093.log  funannotate-annotate.2814974.log  funannotate_out/                  
(funannotate_env) fredrick@cicer:/ppgdata/fredrick/assembly_data/ascochyta/2021-06-01_Ascochyta_lentis_reannotation/A_lentis_Kewell$ head funannotate_out/
annotate_misc/                              fungi-984e0060-ebed-4ff9-8d73-97c7948355c2/ predict_misc/                               update_misc/                                
annotate_results/                           logfiles/                                   predict_results/                            update_results/                             
(funannotate_env) fredrick@cicer:/ppgdata/fredrick/assembly_data/ascochyta/2021-06-01_Ascochyta_lentis_reannotation/A_lentis_Kewell$ head funannotate_out/annotate_misc/annotations.eggnog.txt 
AlKewell_000001-T1  note    EggNog:ENOG502SB2X
AlKewell_000002-T1  EC_number   3.4.14.9
AlKewell_000002-T1  note    EggNog:ENOG503Q3UA
AlKewell_000002-T1  note    COG:O
AlKewell_000003-T1  EC_number   3.4.14.9
AlKewell_000003-T1  note    EggNog:ENOG503Q3UA
AlKewell_000003-T1  note    COG:O
AlKewell_000004-T1  note    EggNog:ENOG503PCNJ
AlKewell_000004-T1  note    COG:S
AlKewell_000005-T1  note    EggNog:ENOG503NVNQ
nextgenusfs commented 3 years ago

Fantastic thanks @fmobegi for confirmation -- I'm going to tag a new release now.

fmobegi commented 3 years ago

Another thing you could check, python -m pip install git+https://github.com/nextgenusfs/funannotate.git which is meant to install the latest code doesn't install under the latest python3.9.6 (the version tagged as 3.9).

ERROR: Package 'funannotate' requires a different Python: 3.9.6 not in '<3.9,>=3.6.0'

Had to altinstall python3.8 to get the pip install to work python3.8 -m pip install git+https://github.com/nextgenusfs/funannotate.git

nextgenusfs commented 3 years ago

Thanks, yes I know it doesn't work in 3.9 yet -- haven't had time to figure out exactly why but its on my list. Its tagged properly in the setup.py which is why you see this error. Conda however doesn't not seem to respect these tags from pypi.