tripal / tripal_analysis_kegg

This module extends the Tripal Analysis Module and provides a method for loading of KEGG ortholog assignments derived from the KEGG Automated Annotation Server (KAAS). The module reads the compressed .heir file or alternatively an expanded heir directory. KEGG assignments appear on each feature page, and a full KEGG report is available for browsing results once uploaded.
2 stars 6 forks source link

Kegg hierarchy files instructions? #3

Closed bradfordcondon closed 6 years ago

bradfordcondon commented 7 years ago

Hi, would it be possible to include an example upload file and/or some basic instructions for generating the heirarchy file? I've run KEGG's blast kOALA and KAAS and i can't find a way to download something resembling the hierarchy file anywhere on the KEGG site.

Thanks!

srobb1 commented 7 years ago

Hi Bradford,

Here is how I dealt with this issue. Not perfect, but I just wanted to get the data into my site.

It has been quite a while but I put together a way to get a hierarchy like file. The big difference is that there is no html in the files I download so pattern matching in the module upload script needs to be modified also to make it work. Here is what I have.

Here is the shell script that I use to get the hierarchy like file.: https://github.com/srobb1/tripal_analysis_kegg/blob/parseNewResultFormat/includes/getData.KEGGKASS.sh

Here is my modified upload script with my changes highlighted: https://github.com/srobb1/tripal_analysis_kegg/commit/0989e67e0b0cb8eedce08c40222753d12519d27a#diff-bb2b21ef7774df8687ff02b0284505c6

I hope this helps, Sofia

bradfordcondon commented 7 years ago

Hi Sofia, thanks for the interesting fix. So your script takes as an input the KAAS job ID and an email and generates a pseudo heirarchy?

Presumably some tool does generate the heirarchy files though. Which one? Maybe it isn't available anymore?

srobb1 commented 7 years ago

The KAAS web tool used to generate the hierarchy files, but after some update they stopped. Are there any standalone tools that give the same data? I know that IPRSCAN will include KEGG and Pathway IDs in their output, but these are not added into chado when the IPRSCAN results are parsed and loaded by the interproscan tripal module.

srobb1 commented 7 years ago

Oh, and Yes, to your question about job ID and email.

srobb1 commented 7 years ago

And I think you are being quite liberal in calling this a 'fix' :) It is more a work around.

spficklin commented 7 years ago

Yes, @srobb1 is right. The KAAS server stopped providing the KASS hierarchy file. I think this is because KEGG has transitioned to a primarily pay service. I suspect the heir files gave information that they wanted to keep for their pay service.... just a guess on my part. We definitely need to find a new workaround for the KEGG module. Sufficient time has passed that I don't think anyone has any more heir files to upload other than perhaps those created in the way @srobb1 mentions.

bradfordcondon commented 7 years ago

If someone has an hierarchy file laying around, it would be great if they could share it so we know what we're comparing to.

I imagine we could reverse engineer the diagram from KEGG term mappings (which we have) . I don't imagine that it would be easy.

Maybe the best solution would be to modify the loader to take a user email and job ID as input to load in the information via Sofia's script?

spficklin commented 7 years ago

Oh, and the KEGG/KASS sumbission ID too... Yeah, that sounds like a reasonable approach to me. Our Galaxy module uses the PHP curl library so it wouldn't be unprecedented to include curl code into a Tripal module.

spficklin commented 7 years ago

And to answer your question about the availability of a heir file. We have one on our Tripal v2 User's Guide:

srobb1 commented 7 years ago

If you just want a mapping of KEGG terms to gene ids, KEGG terms could be pulled from Interproscan output. It is now incorporated into the report if --pathways is selected. This would be a nice addition to the terms that are currently pulled from the report. That said, I do like the tree that is produced from the kegg module.

bradfordcondon commented 6 years ago

Hi all,

So working with @mestato , I'm going to propose a more streamlined way of doing this. Rather than download and scan the hierarchy every single time, why not store the hierarchy as a CV? If we do this, and we map our features to the hierarchy, then we can use the existing or upcoming cv_browser to display all this. Unfortunately the BRITE hierarchy isn't available as an OBO. BUT, it is available in full as JSON here. So, we'd have to import it as a CV somehow (should be fairly easy right?)

We would then also need the KO to BRITE mappings. Those are also available, for example:

enzyme brite mappings

module brite mappings

So the module would then look like this:

Now we can browse the KO or BRITE terms, and the features associated, using the CV browser.

What do you all think? Am I missing something important?

mestato commented 6 years ago

I've been using Ghost Koala instead of KAAS. I'm not sure what the relative merits are. But it sounds like this solution will work no matter what you use (KAAS/GhostKoala/IPS) as long as you get sequence to KO term mappings. Which is nice.

bradfordcondon commented 6 years ago

Basically the Tripal 3 loader doesnt support the hierarchy files anymore: we instead load the kegg ontology as an OBO, annotate features, and support mapping records across the ontology with TRIPAL_CV_XRAY. CV_xray is available here: https://github.com/statonlab/tripal_cv_xray

I think our new implementation is just stronger across the board, hopefully users will agree. closing.