Closed rbartelme closed 4 years ago
Just looked into this. There are 22,943 profiles in this collection. I think it would be much more feasible to implement a workflow and write a blog post to show people all the steps they would need to be able to benefit from this resource from within anvi'o instead of making every anvi'o user download an additional 1.1Gb :(
Maybe something like: anvi-setup-ko-hmm to extend the hmm's if users would like to. Sort of like how the pfam setup works now??
Yes, I think we can do something like that.
@ivagljiva I feel like what you talked about today addresses this :)
@ekiefl Indeed it does. :)
I have been implementing this functionality in the metabolic_reconstruction branch. So far, there is an anvi-setup-kegg-kofams
script that downloads and sets up KOfamKOALA's HMM profiles, and an anvi-run-kegg-kofams
script that runs the gene annotation for a contigs DB.
OOOO This is really exciting!
🎊
@ivagljiva did your commit from last week add the anvi-setup-kegg-kofams
and anvi-run-kegg-kofams
on the main branch snakemake workflow? Does this work with the pangenomics snakemake pipeline?
@rbartelme The commit added only anvi-run-kegg-kofams
to the snakemake contigs workflow.
If I am understanding the pangenomics workflow correctly, it inherits from the contigs workflow and therefore theoretically the anvi-run-kegg-kofams
rule should be available for the pangenomics workflow as well. In fact, that rule is in the default pangenomics config. But I have not tested it in the pangenomics context just yet :/
@ivagljiva I thought that the pangenomics workflow inherited the contigs workflow. Hopefully I'll get a chance to test it within the pangenomics workflow context sometime in the next few weeks. Presumably...I could add rules for Pfam, COG, and KO annotations?
@rbartelme I just looked at the default config file for the pangenomics workflow, and the rules for Pfam, COGs, and KOfams are already in there :)
"anvi_run_kegg_kofams": {
"run": false,
"threads": 4,
"--kegg-data-dir": "",
"--hmmer-program": "",
"--keep-all-hits": ""
},
"anvi_run_ncbi_cogs": {
"run": true,
"threads": 5,
"--cog-data-dir": "",
"--sensitive": "",
"--temporary-dir-path": "",
"--search-with": ""
},
[........]
"anvi_run_pfams": {
"run": "",
"--pfam-data-dir": "",
"threads": ""
},
All you will need to do is set"run": true,
for all of them. :)
Please let me know how it goes!
I am closing this issue as this functionality is provided by anvi-setup-kegg-kofams
and anvi-run-kegg-kofams
. This programs will be in the v7
release for anyone not working with master
:)
Please look into implementing KOFAMKOALA HMM's for gene annotation in anvi'o (https://www.genome.jp/tools/kofamkoala/). This would enable KEGG KO numbers in anvi'o with minimal workarounds.