merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

KEGG KO HMM for anvio #1229

Closed rbartelme closed 4 years ago

rbartelme commented 5 years ago

Please look into implementing KOFAMKOALA HMM's for gene annotation in anvi'o (https://www.genome.jp/tools/kofamkoala/). This would enable KEGG KO numbers in anvi'o with minimal workarounds.

meren commented 5 years ago

Just looked into this. There are 22,943 profiles in this collection. I think it would be much more feasible to implement a workflow and write a blog post to show people all the steps they would need to be able to benefit from this resource from within anvi'o instead of making every anvi'o user download an additional 1.1Gb :(

rbartelme commented 5 years ago

Maybe something like: anvi-setup-ko-hmm to extend the hmm's if users would like to. Sort of like how the pfam setup works now??

meren commented 5 years ago

Yes, I think we can do something like that.

ekiefl commented 4 years ago

@ivagljiva I feel like what you talked about today addresses this :)

ivagljiva commented 4 years ago

@ekiefl Indeed it does. :)

I have been implementing this functionality in the metabolic_reconstruction branch. So far, there is an anvi-setup-kegg-kofams script that downloads and sets up KOfamKOALA's HMM profiles, and an anvi-run-kegg-kofams script that runs the gene annotation for a contigs DB.

rbartelme commented 4 years ago

OOOO This is really exciting!

meren commented 4 years ago

🎊

rbartelme commented 4 years ago

@ivagljiva did your commit from last week add the anvi-setup-kegg-kofams and anvi-run-kegg-kofams on the main branch snakemake workflow? Does this work with the pangenomics snakemake pipeline?

ivagljiva commented 4 years ago

@rbartelme The commit added only anvi-run-kegg-kofams to the snakemake contigs workflow.

If I am understanding the pangenomics workflow correctly, it inherits from the contigs workflow and therefore theoretically the anvi-run-kegg-kofams rule should be available for the pangenomics workflow as well. In fact, that rule is in the default pangenomics config. But I have not tested it in the pangenomics context just yet :/

rbartelme commented 4 years ago

@ivagljiva I thought that the pangenomics workflow inherited the contigs workflow. Hopefully I'll get a chance to test it within the pangenomics workflow context sometime in the next few weeks. Presumably...I could add rules for Pfam, COG, and KO annotations?

ivagljiva commented 4 years ago

@rbartelme I just looked at the default config file for the pangenomics workflow, and the rules for Pfam, COGs, and KOfams are already in there :)

    "anvi_run_kegg_kofams": {
        "run": false,
        "threads": 4,
        "--kegg-data-dir": "",
        "--hmmer-program": "",
        "--keep-all-hits": ""
    },
    "anvi_run_ncbi_cogs": {
        "run": true,
        "threads": 5,
        "--cog-data-dir": "",
        "--sensitive": "",
        "--temporary-dir-path": "",
        "--search-with": ""
    },
[........]
"anvi_run_pfams": {
        "run": "",
        "--pfam-data-dir": "",
        "threads": ""
    },

All you will need to do is set"run": true, for all of them. :)

Please let me know how it goes!

ivagljiva commented 4 years ago

I am closing this issue as this functionality is provided by anvi-setup-kegg-kofams and anvi-run-kegg-kofams. This programs will be in the v7 release for anyone not working with master :)