information-artifact-ontology / ontology-metadata

OBO Metadata Ontology
Creative Commons Zero v1.0 Universal
19 stars 8 forks source link

Annotation of Definition sources - ECO or IAO #44

Open mgiglio99 opened 5 years ago

mgiglio99 commented 5 years ago

DO requested some new ECO terms to capture the type of information that curators were using when writing class definitions. The way the new ECO terms were used in DO owl/obo created problems for users. See: https://github.com/DiseaseOntology/HumanDiseaseOntology/issues/673 This discussion also raised the general issue of whether it was appropriate to have terms like this in ECO. See: https://github.com/information-artifact-ontology/ontology-metadata/issues/43

We thought it would make sense to separate out the issue of whether this kind of term is in scope for ECO into its own GitHub tracker item here.

Here is some more background on these terms: When ECO first got the request for these terms, I felt they were actually reference types and therefore not in the scope of ECO. However, as we thought about it some more, we began to think of the process of attachment of a definition to a class as a type of assertion of its own - asserting that the class has a particular definition. The terms that Lynn requested all were concerned with a curator reading source material and building a definition from that. Therefore, they are all under the 'curator inference'>'curator inference from authoritative source' node (e.g. 'curator inference from book'). Using these ECO terms in this context makes it more explicit that a curator was interpreting the information from a source to provide a definition for the class. One could imagine a computational process that might create definitions for cross product terms automatically based on the definitions of the input terms - that could conceivably also be tracked with an ECO term (although that hasn't happened yet.) In this context, we felt that such terms were appropriate for ECO. However, in light of recent discussions, we are reassessing this.

My apologies that I didn't bring this issue to a wider audience before we made the ECO terms. We recognize there is more than one way to capture this information and we are not wedded to these terms being in ECO. We want to store the information in the way that makes the most sense and works for the use case DO has. One thing to note, if we use IAO, we'd need more terms under 'information content entity'. Currently, there is 'document', with some relevant children, but others would need to be added. We'd be happy to work with IAO to get those created.

Thanks, Michelle

@cmungall, @rctauber, @jbmunro, @nsuvarnaiari

cmungall commented 5 years ago

Thanks for your thoughtful summary. No worries about not bringing this to attention sooner, the OBO landscape is complicated we need more diagrams showing how the different ontologies interrelate.

I had not previously considered constructing a definition to be an inference, but I can see the argument. It would be good to gather opinion from other ontology editors.

It would help me to understand the use case to have some examples of how this might work in some different scenarios

  1. An editor copy-pasting a dictionary definition in more or less verbatim
  2. An editor constructing a definition by synthesizing from multiple sources and using their expertise to word this as a genus-differentia definition using terms consistent with terms in the ontology
    • variant: the curator wants to contradict one aspect of one of the sources, and leave an audit trail
    • variant: the sources were provided by a user on an issue tracker which is commented on by various people; we may want to independently track the sources, the tracker, the submitter, the commenters, and the curator, ascribing roles to each
  3. A content meeting involving contributions from domain experts and a head curator who synthesizes the work of the group
  4. A definition derived entirely by template; or, one that has been modified from a template-generated definitions for readability
    • variant: where a portion of a definition of the differentia is incorporated into the definiens

@alanruttenberg and Sepja Seppala have an excellent paper on definitions they may be interested to comment

@marijane is the contact for the contributor role ontology http://obofoundry.org/ontology/cro and this seems like it is in CRO scope

mgiglio99 commented 5 years ago

Hi Chris, Sorry for the delayed reply. Interesting scenarios. The thinking I outlined above for use of ECO terms for defs will clearly not meet all of these needs. Perhaps a combination of ECO and IAO could work.

  1. An editor copy-pasting a dictionary definition in more or less verbatim

To me this one would be a straight up reference for where the definition came from since there is no inference involved. I know DO will want to track the kind of source this is (book, paper, website) - so perhaps this one is better handled with an IAO term that tags the reference with a "type".

2. An editor constructing a definition by synthesizing from multiple sources and using their expertise to word this as a genus-differentia definition using terms consistent with terms in the ontology

We could make an ECO term 'curator inference from multiple sources' (or something to that effect), assuming one could link multiple references to one definition. Then each reference could be tagged with an IAO term describing its type as in the previous scenario.

  • variant: the curator wants to contradict one aspect of one of the sources, and leave an audit trail
  • variant: the sources were provided by a user on an issue tracker which is commented on by various people; we may want to independently track the sources, the tracker, the submitter, the commenters, and the curator, ascribing roles to each

For the first one perhaps a NOT annotation with an ECO curator inference term? Second one - I don't know.

3. A content meeting involving contributions from domain experts and a head curator who synthesizes the work of the group

Maybe a new ECO term 'curator inference from group consensus' - or something along those lines.

  1. A definition derived entirely by template; or, one that has been modified from a template-generated definitions for readability
    • variant: where a portion of a definition of the differentia is incorporated into the definiens

For the template only part, perhaps an ECO term 'template-based text generation', but that's really sounding like a method description and not evidence. We could go as simple as ECO 'automatic assertion'. Not sure.

Looking forward to hearing people's thoughts.

cmungall commented 5 years ago
  1. An editor copy-pasting a dictionary definition in more or less verbatim

To me this one would be a straight up reference for where the definition came from since there is no inference involved. I know DO will want to track the kind of source this is (book, paper, website) - so perhaps this one is better handled with an IAO term that tags the reference with a "type".

One way to do this is to make the publication an object in its own right, then arbitrary properties could be attached using standard OBO or W3C or semweb properties - e.g. title, type. This could be quite useful to see in Protege, and also in downstream consumers. We can imagine a Protege helper plugin for this.

marijane commented 5 years ago

I am not terribly familiar with ECO, and I can see how this might be in scope of CRO, if people want to describe a curation process that these terms are outputs of and in which someone is playing a curator role in, but that seems orthogonal to the question of whether these terms should live in ECO or IAO. I'm happy to talk about the role annotation aspect, though, if that's of interest.

I am not an IAO contributor but these ECO terms seem a bit more granular than what's in IAO (at least as document subtypes -- the whole evidence tree in ECO seems like something that could go under Information Content Entity as a sibling to document and data item and the like), and they don't seem terribly out of scope for ECO from where I sit. But again, I'm a casual observer here.

beckyjackson commented 5 years ago

Thanks for your responses @cmungall @marijane

One way to do this is to make the publication an object in its own right

I agree that this could be useful, but it would be quite a bit of overhead to make objects for each publication used as an xref in DO... Some of it could be automated, but I think annotating the objects would be a manual task.

I can see how this might be in scope of CRO

Does it make sense to discuss putting these terms in CRO instead, then? We aren't concerned with where they end up, as long as it's the most appropriate spot (and we can use them!).

the whole evidence tree in ECO seems like something that could go under Information Content Entity

This is correct. As a result including OBI in ECO logical definitions, all evidence types end up under Information Content Entity.

I apologize that it's taken some time to respond to this, but we'd like to move forward with annotating the publications with these types of terms in DO. @cmungall are you aware of anybody else who may be interested in commenting on this issue?

marijane commented 5 years ago

@rctauber

Does it make sense to discuss putting these terms in CRO instead, then? We aren't concerned with where they end up, as long as it's the most appropriate spot (and we can use them!).

I've looked at both of the issues linked above and I'm not clear what the specific terms in question actually are. Is there an enumerated list somewhere, or can you otherwise clarify? Right now the CRO is mostly a hierarchy of role types, if the terms in question here are not conceptualized as roles, I'm not sure where they would go.

beckyjackson commented 5 years ago

They are not roles, so perhaps it doesn't make sense to put them in CRO. We just wanted to make sure we investigated all the options. The terms fall under our 'curator inference' branch, with terms like 'curator inference from journal publication' to annotate a defintion source with, for example, a PMID.

The full branch is here: https://www.ebi.ac.uk/ols/ontologies/eco/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FECO_0000205

marijane commented 5 years ago

Aha! Yeah, I don't think these fit in CRO. But you could use the CRO's curator role and contributorship process/relationship to model/annotate the creation of these inferences.