phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
75 stars 28 forks source link

Age ontology class confusion #30

Closed pnrobinson closed 5 years ago

pnrobinson commented 5 years ago

In the Age element we suggest that HPO terms can be used to represent age.

 //  age_class : { term_id : "HP:0003596", term : "Middle age onset" }
    OntologyClass age_class = 2;

But of course they do not represent age, they represent age of onset. Is there some other ontology we can recommend here? If not, we should remove this option for now because it will lead to errors in data entry.

drseb commented 5 years ago

agree. should be age_of_onset_class

julesjacobsen commented 5 years ago

Putting this in context:

// See http://build.fhir.org/datatypes and http://build.fhir.org/condition-definitions.html#Condition.onset_x_
// In FHIR this is represented as a UCUM measurement - http://unitsofmeasure.org/trac/
message Age {

    // The :ref:`ISO 8601<metadata_date_time>` age of this object as ISO8601
    // duration or time intervals. The use of time intervals makes an additional
    // anchor unnecessary (i.e. DOB and age can be represented as start-anchored
    // time interval, e.g. 1967-11-21/P40Y10M05D)
    string age = 1;

    // An age class, e.g. corresponding to the use of "age of onset" in HPO.
    // HPO is recommended, for example, subclasses of "Onset":
    // http://purl.obolibrary.org/obo/HP_0003674
    // Example:
    //  age_class : { term_id : "HP:0003596", term : "Middle age onset" }
    OntologyClass age_class = 2;
}

// this is used in these contexts:
message Phenotype {
...
// the values of this will come from the HPO onset hierarchy
    // i.e. subclasses of HP:0003674
    // FHIR mapping: Condition.onset
    oneof onset {
        Age age_of_onset = 6;
        google.protobuf.Timestamp time_of_onset = 7;
        Period period_of_onset = 8;
        OntologyClass class_of_onset = 9;
    }
}

message Individual {
...
    // An age object describing the age of the individual at the time of collection. The Age object allows the encoding
    // of the age either as ISO8601 duration or time interval (preferred), or as ontology term object.
    // Example:
    //   "individual_age_at_collection": {
    //     "age": "P12Y0M",
    //     "age_class": {
    //         "term": "Juvenile onset",
    //         "term_id": "HP:0003621"
    //     }
    // },
    // See http://build.fhir.org/datatypes
    Age age_at_collection = 4;
}

message Disease {
   ...
    // The onset of the disease. The values of this will come from the HPO onset hierarchy
    // i.e. subclasses of HP:0003674
    // FHIR mapping: Condition.onset
    oneof onset {
        Age age_of_onset = 3;
        google.protobuf.Timestamp time_of_onset = 4;
        Period period_of_onset = 5;
        OntologyClass class_of_onset = 6;
    }
}

I think the age_class might have been a carry-over from the original implementation. It seems like it is actually redundant to the OntologyClass class_of_onset. Perhaps it is best removed from Age and class_of_onset renamed to age_of_onset?

cmungall commented 5 years ago

that makes sense, but then Age would be a single-field class, so maybe the Age class isn't needed at all?

mbaudis commented 5 years ago

@pnrobinson The Age object is a carry-over from the GA4GH schema. It allows to use a single prototype object for both ways to encode an age value, either as ISO8601 period or ontology class. The current version does not document either attribute (age or age_class) as mandatory; obviously this can be handled differently in specific schemas (e.g. PXF may require a value for onset.age_of_onset.age_class).

As it looks right now, the PXF onset definition above would be served by just using an Age object. Both time point as well as numerical age could be just served by a single ISO8601 value (anchored period). However, one could argue for an optional, additional time point for the event.

Pure onset as Age class, with the time point being implicit from the ISO age, which includes the DOB:

{
   "onset" : {
      "age_class" : {
         "label" : "Juvenile onset",
         "id" : "HP:0003621"
      },
      "age" : "2002-09-21/P12Y03M08D"
   }
}

Alternative allowing more options, e.g. specific separate time point:

{
  "onset" : {
    "age": {
        "age_class" : {
           "label" : "Juvenile onset",
           "id" : "HP:0003621"
        },
        "age" : "P12Y03M08D"
    },
    "time_of_onset": "2014-12-29",
}

The Age object is one of the examples already documented (and up for comments) at SchemaBlocks.

pnrobinson commented 5 years ago

@mbaudis thanks for the detailed comment. I think that both SchemaBlocks and Phenopackets are making the same mistake in the way they use/document HPO classes, which are not intended to denote ages but rather ages of onset. Since Age can use an HPO class, and the Disease class can have age-at-collection that is an Age, this can lead to a logical inconsistency.

pnrobinson commented 5 years ago

@julesjacobsen @mbaudis @cmungall @drseb @mellybelly it makes more sense to me to have

message Age {
    // The :ref:`ISO 8601<metadata_date_time>` age of this object as ISO8601
    // duration or time intervals. The use of time intervals makes an additional
    // anchor unnecessary (i.e. DOB and age can be represented as start-anchored
    // time interval, e.g. 1967-11-21/P40Y10M05D)
    string age = 1;
    google.protobuf.Timestamp time_of_onset = 7;
}

and then

message Phenotype {
...
// the values of this will come from the HPO onset hierarchy
    // i.e. subclasses of HP:0003674
    // FHIR mapping: Condition.onset
    oneof onset {
        Age age_of_onset = 6;
        Period period_of_onset = 8;
        OntologyClass class_of_onset = 9;
    }
}

if we choose the Age object, it is an actual age that can be represented as an YYYY-MM-DD type ISO string or a google timestamp. If we want to use an HPO term for the age of onset, it does not need to be inside the Age object, it can go in the other slot.

mbaudis commented 5 years ago

@pnrobinson The use of HPO classes for the age_class should be considered example for using an ontology class that provides a time range with an age scope; I think HPO was put in here as something (then?) "easiest choice around" (not my selection or area of expertise...).

But as you point out, HPO specifically does this for "onset", not just for "any age range with some meaning" - this hurts the intended "neutral" use: For a general age object, I would very much prefer to have age classes w/o any additional attribution of scope.

While one can get a good "onset" definition in the way you propose, I'm interested in a general age object without predetermined scope which can be used in a larger range of applications ("age at sample collection", "age at recurrence", "age of death" ...).

This probably would then also work as part of the PXF Age message; but it should then be documented as NOT using HPO onset - what would be the better, "neutral", alternative? Suggestions appreciated, independent of PXF use ...

pnrobinson commented 5 years ago

I am not aware of a general Age ontology that is in use by the medical research community, although there are a few out there such as http://www.ontobee.org/ontology/HSAPDV?iri=http://purl.obolibrary.org/obo/HsapDv_0000105 The classes of this ontology would be better. However, I think we need to think about use cases. We will almost always have enough information to provide YYYY-MM-DD in any clinical setting, and for most published data we will have YYYY. There is no advantage in this case of using an ontology term. The HPO terms are not originally intended to be used for individual patients, but instead they describe the typical age of onset of a feature or a disease. You can use them for patients if you do not have exact information. I can imagine there are cases where we want to make the data fuzzy but still largely accurate to protect privacy, and here we could use ontology terms, e.g., from the stage ontology above. Therefore, one solution would be to replace the comment's use of HPO by HSAPDV, or just leaving it open what ontology to use.

mellybelly commented 5 years ago

A cross-species age representation is in the stages portion of Uberon. There is still one discrepancy with HPO, that we never finished aligning, if I recollect. This would be the recommended source for age in the foundry. Allen algebra and the use of these classes should address the needs described above. Age of onset represented in HPO should be axiomatized using uberon stages i expect?

drseb commented 5 years ago

I agree with @pnrobinson here. Why not simply use the date?

pnrobinson commented 5 years ago

FYI Here is a link to that part of UBERON: https://www.ebi.ac.uk/ols/ontologies/uberon/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_0007222 If we use this in a clinical context it would have to be very well hidden, but I agree it is good for everybody to converge upon something like this. Still, for 99% of use cases I do not see any advantage of using an ontology to capture age, it adds complexity without providing much additional utility.

mellybelly commented 5 years ago

There is a need to record date of the phenopacket (or version), and age of onset of the phenotype or disease. I would expect ISO 8601 to be used former; it could be used as the latter, but definitely no patients' DOB though! We will eventually want a way to align age of onset and age of organisms and the two approaches should be interoperable and Allen algebra apply if the modeling is correct, no?

Re Uberon, we could create a clinical tag for the stages for human-relevant stages and make a slim, if that is helpful.

pnrobinson commented 5 years ago

But in a purely clinical context of the future when people are using phenopackets to report phenotypes to the lab for genome sequencing, of course we will report DOB and the phenopacket will never leave the hospital's firewall (this is at least a possible scenario, even though hopefully the phenopacket will be transmitted as a FHIR equivalent). I would expect that for published data using YYYY is sufficient and usually possible.

As a way of moving forward on this issue, I think that we are in agreement that using an ontology term such as UBERON should be possible? My main problem was that HPO is not the right ontology for this use case. We can solve all problems merely by revising the comment of the Age message. Do we have consensus for this?

mellybelly commented 5 years ago

see also https://github.com/obophenotype/uberon/wiki/Modeling-developmental-stages made ticket for uberon just in case: https://github.com/obophenotype/uberon/issues/1485

drseb commented 5 years ago

But for this users will have to map the age of the patient to a stage. Not sure this is always trivial.

mbaudis commented 5 years ago

So let's try to recap:

@pnrobinson suggested

Therefore, one solution would be to replace the comment's use of HPO by HSAPDV, or just leaving it open what ontology to use.

... which is IMO a good option for providing an age_class, additionally or alternatively to an ISO period. For an Age object, that may be my preferred solution - i.e. the status quo for the GA4GH schema & SchemaBlocks (where the HPO use was only an example of any suitable ontology, but not a good one - I'll replace this)

There is also the option to use an age range; however, this cannot be coded as a single ISO value since this requires then 2 ISO period values, one for each age. The most elegant solution here would be to have age as an array of ISO objects:

Alternatively separate attributes ...

This also covers @drseb 's comment about mapping, basically leaving this to the specific use case.

(All this from my side focusses on the optimal way to provide age information, not necessarily the optimal solution in PXF, for onset etc. But either categorical or range based "fuzzy" ages are needed.)

julesjacobsen commented 5 years ago

I'm with @pnrobinson suggestion - it is much like FHIR. The onset ought to be oneOf {Age, datetime, Period, OntologyClass}. I agree the use of an HPO code for age is not a good example.

For reference FHIR uses Unified Code for Units of Measure (UCUM) for defining Age

mbaudis commented 5 years ago

@julesjacobsen So what about the definition of the „age“ part, then? Even in the phenopackets schema, there will be uses beyond onset; and those will need a range option (class and/or quantitative age range). So, right now fixing the ontology example for age_class (making clear that it does not itself defines onset). If there is more discussion about the Age object, move this to a new issue.

mbaudis commented 5 years ago

@julesjacobsen The FHIR definition does not include an age range option?

julesjacobsen commented 5 years ago

Surely an age range is a compound of a start and an end age? Age is not tied exclusively to Phenotype.onset, it's simply meant to represent a time period attached to an object.

FHIR does have an age range, but its basically a holder for an upper and lower bounds of a type: http://build.fhir.org/datatypes.html#Range

mbaudis commented 5 years ago

@julesjacobsen Yes, thanks - that fits as model; with upper & lower being values in the form of "ISO8601 periods" (i.e., ages). But this concept isn't yet mapped to PXF, right? There is only a period.

Interestingly, FHIR allows a missing lower or upper and requires sorting. So this would map to the array example in my post above and be an argument for having the numeric (well, ISO) age component always as range/array...

I would love this, but opinions abound probably? WDYT @julesjacobsen?

julesjacobsen commented 5 years ago

Closing this as the thread has meandered into issue #10 and the associated SchemaBlocks issue https://github.com/ga4gh-schemablocks/blocks/issues/8