oborel / obo-relations

RO is an ontology of relations for use with biological ontologies
http://oborel.github.io/
Other
92 stars 46 forks source link

NTR: 'has gene template' to replace either 'gene product of' or 'has gene product' (difficult to tell) #675

Open nataled opened 1 year ago

nataled commented 1 year ago

BACKGROUND: The Protein Ontology has been using a relation 'has_gene_template' for close to 10 years. In an attempt to come into alignment with updated principles, we wanted to at least place the relation under an appropriate RO relation. In so doing, we came across two similar relations, one of which presumably means the same thing (I can't tell which). I can't tell because the definitions for these, along with the additional information, make them difficult to distinguish or determine how they should be used:

RO:0002204 'gene product of' = Definition: "x has gene product of y if and only if y is a gene (SO:0000704) that participates in some gene expression process (GO:0010467) where the output of that process is either y or something that is ribosomally translated from x." Editor's note: We would like to be able to express the rule: if t transcribed from g, and t is a noncoding RNA and has an evolved function, then t has gene product g.

RO:0002205 'has gene product' = Definition: X has gene product y if and only if x is a gene (SO:0000704) that participates in some gene expression process (GO:0010467) where the output of that process is either y or something that is ribosomally translated from y. Example of usage: every sonic hedgehog protein (PR:000014841) is the gene product of some sonic hedgehog gene; every HOTAIR lncRNA is the gene product of some HOXC gene.

Disregarding the confusing definitions, and focusing on the term name, I believe the most directly analogous RO term for PRO 'has gene template' would be 'gene product of'. In support of this, the sole ontology using that term (based on the report from OntoBee, and looking for actual usage as opposed to mere import), MRO (cc: @rvita), uses 'gene product of' precisely how PRO uses 'has gene template'.

There are no ontologies using 'has gene product'. About two dozen ontologies use PRO's 'has gene template', though it is likely that these uses are all due to importing PR terms that themselves use the relation.

PROPOSAL: new term has gene template to be placed under RO:0002330 genomically related to and deprecate gene product of.

The definition for 'has gene template' is long and involved so that it can cover all possible cases. In constructing the definition, another term 'has mRNA template' was created. The definition is reproduced below, but a separate ticket can be created if the 'has gene template' relation is accepted for RO inclusion.

The definitions below show relations using underscores for ease of reading. The following terms are also referenced:

‘amino acid chain’ (PR:000018263)
‘has_output’ (RO:0002234)
‘mRNA processing process’ (GO:0006397)
‘nucleic acid-templated transcription process’ (GO:0097659)
‘reverse transcription process’ (GO:0001171)
‘RNA processing process’ (GO:0006396)
‘translation process’ (GO:0006412)

a has_mRNA_template b = [definition] a is an amino acid chain, b is an mRNA, and b is the template for some translation process p where p has_output a, and there is a time t where b exists but a does not.

In reading the following, it might be helpful to know what scenarios are being addressed:

1 is for protein from DNA gene 2a is for protein from RNA gene (likely viral) where the gene can be used directly 2b is for protein from RNA gene (likely viral) where the gene first goes through reverse transcription 3a is for a = any primary transcript 3b is for a = any processed RNA (including miRNA, mRNA) derived from a primary transcript

a has_gene_template b = [definition] b is a gene, and:

(1) if a is an amino acid chain and b has_bearer some DNA, then a has_gene_template b iff b is the template for some nucleic acid-templated transcription process p1 where p1 has_output some primary transcript c, and c is the input for some mRNA processing process p2, and p2 has_output some mRNA d, and a has_mRNA_template d;

OR

(2) if a is an amino acid chain and b has_bearer some RNA, then a has_gene_template b iff either (a) b is the template for some translation process p where p has_output a, and there is a time t where b exists but a does not, or (b) b is the template for some reverse transcription process p1 where p1 has_output some DNA c, and c is the template for some nucleic acid-templated transcription process p2 where p2 has_output some primary transcript d, and d is the input for some mRNA processing process p3, and p3 has_output some mRNA e, and a has_mRNA_template e;

OR

(3) if a is an RNA and b has_bearer DNA, a has_gene_template b iff either (a) b is the template for some nucleic acid-templated transcription process p where p has_output a and there is a time t where b exists but a does not, or (b) b is the template for some nucleic acid-templated transcription process p1 where p1 has_output some primary transcript c, and c is the input for some ‘RNA processing process’ p2, and p2 has_output a, and there is a time t1 where b exists but c does not and a time t2 where c exists but a does not.

cthoyt commented 1 year ago

Hi @nataled, this is a bit dense to read. Do you think you could start all of the examples with the plain english, then add undertext with the more "ontology"-like text?

Another thing that comes to mind is #495 - about connections between pre-processed and post-processed miRNA.

nataled commented 1 year ago

This all comes from work I did over a dozen years ago but never published, wherein I modeled what amounts to the 'central dogma' of biology, allowing for the necessary tweaks and accounting for things like mutations. This is actually a very small part of that work. That being said, I don't believe I had anything specific to miRNA maturation.

The plain English:

'has_mRNA_template' relates between a protein and its mRNA template. Easy enough. I created this relation thinking that it could be used by those who want to relate mRNA to gene.

'has_gene_template' is probably best explained historically. When I first developed the relation, I was mostly concerned with relating a protein to the DNA gene that encodes it. I could have done that directly, but I don't believe in creating relations that have narrow application but broader implication (even by name). Thus, I didn't want to use has_gene_template for proteins only when various types of RNA also rely on genes as templates. I thus added various scenarios under which one could use the has_gene_template relation. Scenario (1) covers my original use case, that of protein to gene. It basically says that relating a protein to its gene can be done by saying a particular mRNA is transcribed from the gene, and that mRNA is translated into the protein. Scenario (2) covers cases where the gene is not encoded on DNA, but instead on RNA. Sometimes the RNA is used directly (2a) while other times it is first reverse transcribed into DNA (covered by 2b). Scenario (3) covers RNA cases, 3a for primary transcripts, and 3b for processed transcripts.

matentzn commented 1 year ago

Related: https://github.com/OBOFoundry/COB/pull/179

nataled commented 1 year ago

From RO meeting: 1) Should create a plain English definition, with the more involved text as editor's note. 2) domain & range? Could be problematic due to issues surrounding 'gene' 3) diagram? - Will work with Damion

cmungall commented 1 year ago

In so doing, we came across two similar relations, one of which presumably means the same thing (I can't tell which).

These are inverses

You can see this in Protege in the "Description" tab:

image

Unfortunately inverses are not shown in OLS:

https://github.com/EBISPOT/ols4/issues/237

Note the two textual definitions are inverses - the content is identical except for switching x or y. This was before we stopped adding redundant inverse definitions and including notes; see #51

jamesamcl commented 1 year ago

Inverses will be shown in OLS4 tomorrow when the next datarelease goes live.

dosumis commented 1 year ago

gene product of / has gene product and expresses/expressed_in are all defined in terms of GO processes - punting complex details of the process of gene expression to GO - with the broadest GO terms covering every step of the process. I think this is much cleaner and less confusing than trying to compose some super-complex relation definition that attempts to capture the process - although probably not it in any way we might programatically unpack. It is also much more in the spirit of OBO orthogonality.

There may well be room for improvement in how the current definitions are phrased and it may be useful to review the GO terms themselves. We should also probably do a better job of making clear in the comment how the GO process(es) referenced cover the various cases.

cmungall commented 1 year ago

I am happy to change the label, obsolete and make a new ID, etc. But I agree completely with David. In general shortcut relations in RO should reuse existing definitions in the relevant in-scope ontology,

nataled commented 1 year ago

@dosumis @cmungall I'm not really clear what the nature of the objection is. I mean, in addition to the already-noted one that it is too complex (I'm working on simplifying to plain English). Should I have left in the identifiers to show that GO is being reused? I took them out because it was hard(er) to read with them there.

nataled commented 1 year ago

On further reflection, I suspect the objection is because of statements like this:

(2) if a is an amino acid chain and b has_bearer some RNA, then a has_gene_template b iff either
(a) b is the template for some translation process p where p has_output a, and there is a time t where b exists but a does not

Perhaps this is being construed as an attempt to (re)define GO translation? Such redefinition wasn't the intent.

Let me know if this is on the right track so I can make the appropriate adjustments.

dosumis commented 1 year ago

That's exactly the point. Do we need such a fine grained attempt to logically defined translation if we can just refer to the relevant processes in GO? The competing relations attempt to do this by referencing GO terms that already encompass many subprocesses, and so we don't need to reference each individually to make a compete def (with many ORs) in RO.