obophenotype / developmental-stage-ontologies

Source files for various species-specific stage ontologies
10 stars 5 forks source link

Add half-day interval developmental stages to MmusDv #27

Open IlincaTudose opened 8 years ago

IlincaTudose commented 8 years ago

We use MmusDv for stages and we appreciate the ontology structure and how well it maps to UBERON and through it to other species. We use it when we can (for stages), but we also need embryonic age in days, at 0.5 days intervals, so e0, e0.5, e1, e1.5 … e22. Would it be possible to add classes for them in MmusDv?

fbastian commented 8 years ago

Seems reasonable, do you agree @ANiknejad?

tfhayamizu commented 8 years ago

In cases where sufficient morphological description is not available, the mouse Gene Expression Database (GXD) uses a standardized set of mappings from embryonic d.p.c. to Theiler Stages, following the criteria described by Theiler. A summary of this available at the eMouseAtlas site: http://www.emouseatlas.org/emap/ema/theiler_stages/StageDefinition/stagedefinition.html

fbastian commented 8 years ago

Thanks @tfhayamizu, we follow your classification already, it will be easy to include these new "temporal" terms as subclasses of the classical Theiler Stages.

ANiknejad commented 8 years ago

actually, there is already 'dpc' information for each Theiler stages in MmusDv, as @tfhayamizu pointed out, but please notice that these stages overlap

MmusDv:0000003 name: Theiler stage 01 synonym: "E0-2.5" EXACT []

AND

MmusDv:0000005 name: Theiler stage 02 synonym: "E1-2.5" EXACT []

so creating subclasses with 0.5 day intervals would greatly impact ontology clarity

mellybelly commented 8 years ago

@ANiknejad would this hurt things if we added these as part relations to the existing stages, and then gave them some subset tag so that they could be excluded where you don't want them in there? The IMPC group needs them; alternatively I think we should create a generic dpc ontology that any organism can use that is purely temporal and then can be mapped statistically to the individual species staging ontologies that are based upon morphological criteria? I think I'll do that...

cmungall commented 8 years ago

This doesn't really pertain to @IlincaTudose's request, but regarding @ANiknejad's

but please notice that these stages overlap

Note we use the immediately_preceded_by relation, so formally the stages abut (do not overlap)

But this obviously conflicts with the textual definitions. If we were to convert to OWL

TS2: has-part* exactly 2 cell OR has-part exactly 4 cell TS3: has-part exactly 4 cell OR has-part exactly 8 cell OR has-part exactly 16 cell

So long as 'has-part exactly 4 cell' is satisfiable, TS2 and TS3 overlaps.

This complicates abilities to use stages in reasoning in some scenarios.

One possibility is to fudge this by pretending that the 4-cell stage is arbitrarily temporally bisected in the middle, with the first half apportioned to TS2 and the 2nd half apportioned to TS3.

Or we could use a weaker relation

fbastian commented 8 years ago

we should create a generic dpc ontology that any organism can use that is purely temporal and then can be mapped statistically to the individual species staging ontologies

What would be the benefit as compared to simply capturing a dpc value? What would be the information that would be structured in such an ontology (e.g., 1dpc preceded_by 0.5dpc?)

We could already define an automatic mapping using the start_dpc/end_dpc relations. But how to "statistically map"? We would at least need a temperature information. Do we have a model providing the probability of being at a given stage, depending on dpc and temperature? Does the IMPC group capture temperature information? Then, shouldn't it be a dpc-temperature ontology?

(sorry to sound so counter-productive, it's just that I don't think an ontology is needed to capture a dpc value; and mapping dpc to dev. stages is not trivial)

fbastian commented 8 years ago

Hmm, or should we simply use the mapping mentioned by @tfhayamizu? (sorry, I didnt realize the table was also providing exact values, not only ranges: http://www.emouseatlas.org/emap/ema/theiler_stages/StageDefinition/stagedefinition.html)

cmungall commented 8 years ago

What would be the benefit as compared to simply capturing a dpc value?

We could say the same about classes like MmusDv:0000091 ! 20 month-old stage and over

But still I think it may be better to record the dpcs using a numeric value and map to Theilers (with the caveats about variability)

However, if @IlincaTudose requires ontology class binning for convenience (e.g being able to reuse existing database slots) then as a compromise, what about creating a new ontology (so as to preserve the simplicity and uniformity of MmusDv) for this purpose. The ontology could be managed in this github repo alongside MmusDv for convenience. It would have mappings using whatever relationship type we feel most appropriate to capture the variability. It could therefore be used in combination with MmusDv, but it would still be the default to use MmusDv in isolation.

The ontology and mappings could be generated automatically from the annotations in MmusDv quite easily (and repeated for any other ontology).

@IlincaTudose, opinions?

mellybelly commented 8 years ago

We simply need to have the two structures and their relationships implemented, as both are in wide use. The problem with the table describing the Theiler stages is as discussed above, they are overlapping. We really just want a simple set of ranges - why don't I take a whack at this and then we can decide on the interoperability strategy later? The alternative is a data property, which could go in RO.

mellybelly commented 8 years ago

I added the time based file here: https://github.com/obophenotype/developmental-stage-ontologies/blob/master/src/mmusdv/mmusdv-by-day.owl It currently imports mmusdv, but doesn't have any temporal part relations to any of the morphological stages (Theiler or otherwise). There is an annotation property 'end, days post coitum'. 'prenatal stage' seems to use this to indicate end time, but it is a float. These could converted to axioms that use the new classes. @IlincaTudose let me know if you want anything further at this time, I'll register the purl prefix if you are good with this file.

fbastian commented 8 years ago

The problem with the table describing the Theiler stages is as discussed above, they are overlapping

But they also provide exact values along with the range, for cases, as @tfhayamizu mentioned it, "where sufficient morphological description is not available". This seems to be the use case here.

So, suggestion: what about making this very simple, as part of the core MmusDv, using classical part_of relations, using the exact dpc values provided? I mean, even when using stages, some classification are already arbitrary (e.g., for fully formed stages, the classification between mature/immature individuals)

I'm also fine with your other suggestions.

mellybelly commented 8 years ago

@fbastian I'm happy to move it into the core MmusDv and having those part relations where they can be assigned. I don't think we'd want them to be part of multiple Theiler stages where the Theiler stages overlap. Also, while the stages are proper stages, these are dpc- not necessarily stages but actual time points. So they probably don't mark the beginning of a stage, for example 'Theiler stage 8' has an annotation property 'start, days post coitum' 6.0; but rather refer to the actual time of 6dpc.

I didn't add text definitions yet because a) they are seemingly trivial but then b) I am actually not sure what all the use cases are here and there may be multiple meanings attributed (e.g. maybe not a stage boundary/time point?).

All that said, dpc is used as an estimate in the lab - its not really like we really know the exact time of coitum most of the time ;-).

fbastian commented 8 years ago

I don't think we'd want them to be part of multiple Theiler stages where the Theiler stages overlap.

Actually, do you need any relation at all to Theiler stages? Maybe we can simply create a separate branch in "embryo stage", unrelated to Theiler stages?

So they probably don't mark the beginning of a stage, for example 'Theiler stage 8' has an annotation property 'start, days post coitum' 6.0; but rather refer to the actual time of 6dpc.

At the end of the page I linked to, they explain how these dpc should be understood.

IlincaTudose commented 8 years ago

Thanks for all the work @mellybelly, @fbastian, @cmungall! It looks promising already. I like the structure but I also have some comments:

  1. We were hoping to have the dpc values integrated in MmusDv as it's a somehow "established" ontology. Our data is open and I'm worried outside users will not know where to look up DPCO ids and how to use them further. Also we do use 2 ontologies at the moment EFO for dpc and MmusDv for stages and we were hoping to simplify this. Without Theiler Stage mapping, is the MmusDv integration still problematic?
  2. We don't need the relations to Theiler Stages as we know they're problematic. However, we would need them be part of "embryo stage".
  3. The labels are quite long. Would it be possible to use the "dpc" notation in the class names and "e" and "days post coitum" as synonyms?

Thanks.

mellybelly commented 8 years ago

Assuming everyone is ok with it, I'd be happy to take care of all three.

  1. Merge the new file into MmusDV - currently the file imports MmusDV.
  2. This is no problem.
  3. I'm fine with this though it is non standard to use an abbreviation as primary. I think here utility is more important and we should swap them.
dosumis commented 8 years ago

If you're going to use time to define developmental stages, then I think this should be recorded using data properties. They are not supported by ELK, but are in OWL2 EL https://www.w3.org/TR/owl2-profiles/#Feature_Overview, so potentially scalable. They are very useful for ordering-based queries (e.g. query for stages between X and Y), and a lot less work than Allan algebra type property chains (although you should have these too). See FBdv + FBcv for examples of reasoning with temporal property chains + data properties. (Would love to go back and do more with this in FBdv if I had time/funding, which I don't...)

IlincaTudose commented 7 years ago

@mellybelly is there any progress on the 3 points mentioned above?