oborel / obo-relations

RO is an ontology of relations for use with biological ontologies
http://oborel.github.io/
Other
94 stars 47 forks source link

Remove rule inferring contributes_to, add new relation to use in this rule. #194

Closed dosumis closed 7 years ago

dosumis commented 7 years ago

We currently have a rule that infers contributes_to for all components of a complex when a complex enables an activity. Based on discussion at the GO editor's training workshop, this appears too loose an interpretation of contributes_to. But curators still want some connection to be made between the activity of a complex and its components.

What should this relation be called?

@krchristie @vanaukenk @rlovering @cmungall

krchristie commented 7 years ago

It seems to me that the problem is with the automatic inference, rather than with the name of the relationship.

Emergent functions that only exist in the complex In the case of a complex of a complex where the MF is a function of the complex as a whole, then it is usually fine for the subunits to receive this qualifier. An example we have discussed previously include the multisubunit RNA polymerases, where there is no catalytic subunit, but instead the catalytic unit is formed by the complex. In this type of case, it is mostly fine to use the contributes_to qualifier for all of the subunits (though there can be subtleties such as the fact that 2 of the 12 subunits of RNA polymerase II are NOT required for catalytic activity).

Functions that exist in single subunits of complexes However, there are other complexes where one or more MFs present within the complex are the properties of a single subunit of that complex. Examples here have included translation elongation factors that contain a GTPase (@hdrabkin knows more about these than I do; also see http://www.ebi.ac.uk/interpro/entry/IPR004540) and complexes containing kinase activities. In this case, it is NOT appropriate to propagate "contributes_to GTPase activity" or "contributes_to kinase activity" to ALL subunits of the complex. People including @ValWood have objected to having the NON-catalytic subunits of a given complex tagged with "contributes_to MF activity" for an activity that they do not possess.

Mix of single subunit MFs and emergent MFs in the same complex Then, for complex functions, e.g. 'translation elongation factor activity' or 'DNA-binding transcription factor activity', there are some functions that reside in single subunits, and others that are the property of the complex. So, for the translation elongation factor activity that contains the 'GTPase activity', it is NOT appropriate to propagate 'contributes_to GTPase activity' to all subunits, but it WOULD be fine to propagate 'contributes_to translation elongation factor activity'.

I don't see how it would be possible to apply the 'contributes_to' appropriately in a single rule that applies to all complexes. The whole point of this qualifier was to allow curators to indicate when multiple subunits of a complex are required for an MF versus cases like GTPase activity or kinase activity which are typically present in a single subunit.

So, I am not in favor of any rule that propagates 'contributes_to' (or any renamed form of this qualifier) to all subunits of a complex. I think this needs to remain curated information, and thus that we would need some system to allow which specific MFs of a complex are appropriate to propagate to subunits. Furthermore, I think it would be best if curators can indicate which subunits receive the propagated "contributes to MF" to allow to best represent what is known. For example, as I mentioned above, the catalytic activity RNA polymerase II resides in a 10 subunit catalytic core that does NOT possess the RPB4 and RPB7 subunits that ARE considered to be part of the "RNA polymerase II complex".

cmungall commented 7 years ago

Thanks for the detailed analysis @krchristie. I agree with all of the above. But I would emphasize separation of

  1. arbitrarily named shortcuts for various rules or chains
  2. the subset of those that we want to place in the GO qualifiers subset
  3. the single member of that subset that we want to equate to the existing contributes_to qualifier
cmungall commented 7 years ago

There is a lot of metadata and axioms attached to the existing relation, see:

RO:0002326 ! contributes to

It's currently quite a strong relation and I believe consistent with @krchristie's usage. It implies many things but there is nothing that implies it, at the moment.

dosumis commented 7 years ago

To clarify: I am not suggesting renaming contributes_to, and I am proposing removing automatic inference to contributes_to from the current rules (i.e. it would remain a manually applied qualifier). I think we're all on the same page on this.

But it also seems reasonable to have some relation/qualifier the covers the looser meaning (applies do all subunits where only one has the function) and that retains this inference rule. I believe that Ruth's group favors capturing some relationship in these cases (@rlovering: please correct me if I'm wrong. Also, do you have any suggestions for what to call such a relation?)

contributes_to needs a clear, tight definition. Here's an attempt at something semi-formal:

X contritbues_to Y iff: X is a gene product that is part of a complex (C) Y is a molecular function enabled_by C X is required for C to enable Y X does not enable Y

Notes:

krchristie commented 7 years ago

Regarding

But it also seems reasonable to have some relation/qualifier the covers the looser meaning (applies do all subunits where only one has the function) and that retains this inference rule.

I'm not sure I see the utility of a relation/qualifier to indicate that a gene product is present in a complex that contains an activity when those gene products don't have or contribute_to that function.

It seems that the only thing that can be automatically inferred for all gene products (X or W) of a complex (C) that has some function (Y) is: X present in a complex (C) with Y activity

However, I don't understand how it would be useful to say this since it does not distinguish between these three possibilities:
X contributes_to Y as part of C (scope of "contributes_to" qualifier) X enables Y while in C (currently indicated by annotation of X enables Y and X part_of C) W present in C where X enables Y (not currently indicated)

I don't understand what this would achieve. I would understand wanting to specifically indicate that X enables its function Y while in C, but I don't have any understanding of why we would want to be able to indicate W present in C where X enables Y.

Perhaps I still do not understood what you are trying to do?

ValWood commented 7 years ago

I agree with Karen.

But curators still want some connection to be made between the activity of a complex and its components.

Why? Do we have a use case? (the connection is at the common process)

Many complexes are multifunctional, and we wouldn’t want to associate every activity with every complex member (unless it really does contribute to the activity directly)

For example SWr1 complex has simultaneously lysine demethylase ubiquitin ligase histone deacetylase and DNA binding acetylated histone sensor (bromodomain) ATP helicase/ATPase and more…… This is quite typical. would we really want to transfer all of these activities to all subunits? The connection between the gene products is the common process/pathway that their concerted actions are performing.

Emergent functions that only exist in the complex

I really like Karen’s distinction. At PomBase we only use contributes_to in this context (the catalytic unit is formed by the complex). It doesn’t make much sense to use it any other way.

We have other ways to annotate complex subunits that are genuine molecular function regulators (this would require more than an IMP annotation which could be indirect, for example, if the complex does not form correctly).

I have previously proposed restrictions to this qualifier to make the annotation more robust and useful to:

i) Emergent functions that only exist in the complex Now I look at the annotation possibly an evidence code restriction would also be useful ii) Evidence IDA or ISO/IBA (I suggest this because I see annotation via RCA, and IPI it should not be possible to make this annotation via these evidence codes, you just don’t have the necessary data)

This would have lists of benefits i) The annotation of the qualifier would always be transferrable by PAINT because it is an integral catalytic component ii) stripping the contributes_to qualifier (as many resources do/pipelines ) would mean the annotation was still correct iii) Would prevent complex subunits which do NOT contribute to the activity directly from becoming annotated

This would not be such a big overhaul. There are ONLY 6261 contributes_to annotations. Not so many are experimental.

It would be easy to compile a list of complexes with emergent functions and “bless them” for contributes_to label (things like the polymerases, F1-F0 ATPase, proteasome etc).

dosumis commented 7 years ago

On Aug 27, 2017, at 4:20 PM, Val Wood notifications@github.com wrote:

I agree with Karen.

But curators still want some connection to be made between the activity of a complex and its components.

Why? Do we have a use case? (the connection is at the common process)

Hi Val,

All good points. Mainly I'm trying to support the usage that I believe Ruth and her group (and perhaps others) have been following and wanted to support. But I’m not particularly attached to this otherwise. @rlovering can you comment? Hope I’m not mischaracterizing your position.

Cheers, David

vanaukenk commented 7 years ago

Since this property chain initially came from a discussion of how to represent complexes and their activities in GO-CAM models, I think it'd be useful to work through some more examples in Noctua to make sure we're satisfied with, and consistent about, how we capture annotations there for the different scenarios that @krchristie and @ValWood have articulated.

vanaukenk commented 7 years ago

Started putting together the pieces of GO-CAM models to examine how we want to treat different types of complexes and their individual and collective activities:

DNA-directed RNA polymerase II, core complex (GO:0005665) http://noctua.berkeleybop.org/editor/graph/gomodel:59a105b300000003

eukaryotic translation elongation factor 1 complex (GO:0005853) http://noctua.berkeleybop.org/editor/graph/gomodel:59a105b300000026

Need to also look again at the beta-catenin destruction complex and some simpler enzyme/regulatory subunit complexes.

hdrabkin commented 7 years ago

So I'm wondering about the endoplasmic reticulum and protein membrane anchor annotations.

vanaukenk commented 7 years ago

@hdrabkin I was starting to collect manual annotations that exist for these gene products; some are annotated to the ER. I added the protein membrane anchor function to help think about the potential individual roles of each of the components of the complex. It's still a work in progress, mostly intended to give us something to start working from to improve our modeling and documentation for complexes and contributes_to. Please feel free to contribute to the model if you have more insights (or make new ones for other complexes).

vanaukenk commented 7 years ago

I've created test GO-CAM models for several different types of protein complex scenarios:

http://noctua.berkeleybop.org/editor/graph/gomodel:59c8885900000281

These are somewhat made up, so don't worry about the biological details. This is mostly a way to test how we might model complexes in Noctua and what rules we would need to get the annotations we want.

@cmungall wrt testing, do we want to start formalizing this more? I could write up a doc about what each model is meant to show and what annotations we want (although that part may come after the Cambridge meeting).

cmungall commented 7 years ago

Yes, what we'll do is save the OWL from the model and write a junit test around it. In the interim just writing this up in a google doc would be a good start

dosumis commented 7 years ago

On the general subject of this ticket: Is there general agreement that we will not have a relation corresponding to the chain: part_of o enables (i.e. one that applies to all members of a complex where only one enables the function). ?

I thought some might speak up for it (@rlovering? ), but in the absence of any argument for it can this ticket be considered done and related issues moved to another ticket?

The edge cases for contributes_to certainly need documenting for curator guidance, but I don't think this will lead to any new formalization in OWL.

ValWood commented 7 years ago

We currently have a rule that infers contributes_to for all components of a complex when a complex enables an activity.

We should not do this because it is not always (or even often true).

http://wiki.geneontology.org/index.php/2017_Cambridge_GOC_Meeting_Agenda#Contributes_to_guidelines_.28Kimberly.29

krchristie commented 7 years ago

I agree with Val.

dosumis commented 7 years ago

We currently have a rule that infers contributes_to for all components of a complex when a complex enables an activity.

We all agree (see title of ticket and discussion here), and the axiom doing this was removed some time ago

https://github.com/oborel/obo-relations/pull/202#issuecomment-329418722

It should have filtered through to GO-CAM by now. There doesn't seem to be anything left to do on this ticket so I'm closing it. If there are additional examples and clarification that need adding to contributes_to, please could these be specified on another ticket?

Cheers, David