Review data model for consistency

jmandel commented 12 years ago

Next Steps

Initial review, targeting a one-page document describing deficiencies / desiderata

jmandel commented 12 years ago

Allergy model Issues:

an allergy has no dates... Add start/end dates to allergies
Free text representation?
Re-evaluate coding decisinos
Allergy sp:severity should be optional
Allergies -- don't have any dates.

jmandel commented 12 years ago

sp:abnormalInterpretation should be sp:labResultInterpretation

jmandel commented 12 years ago

LabResultPanel needs to appear among clinical statements. And we need examples + documentation.

jmandel commented 12 years ago

Demographics

None of our sample demographics supply race, ethnicity, or preferredLanguage.

I think the right thing to do is require preferredLanguage be a ISO 639-1 string (two letter language abbreviation) whenever possible. http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

Race + ethnicity should be changed from free-text to coded per CDC code sets.

jmandel commented 12 years ago

spcode:RxNorm_Semantic should be spcode:RxNorm_SemanticDrug (to allow us to cover larger subsets of RxNorm in the future...)

jmandel commented 12 years ago

VitalSigns needs to define a predicate for Head Circumference.

We may just want to make these generic like lab results (rather than explicit predicates for each vital type). But this evokes the old escape hatch... and doesn't leave a consistent approach to more nuanced results (e.g. blood pressures with systolic, diastolic, and body position).

jmandel commented 12 years ago

Dan wrote:

It seems to me that this discussion is attempting to optimize tradeoffs between the perspectives:

App developers (who know nothing about LOINC) should have an easy time writing apps. So the SMART standard should define a set of mappings from LOINC => English for them.
App developers need to be able to represent a wide set of vitals in their apps. The SMART standard can’t cover all of these, so there should be a way for developers to access ‘unmapped’ vitals.

What I haven’t seen in this thread is an indication of the relative weight of these perspectives. Which do we care about more? And by how much?

If our weights are 100%, 0%, respectively, the current implementation is clearly the correct one, as it provides a simple representation of a very limited set of vitals, with no mechanism for representing other vitals.

If the weights are 0%, 100%, respectively, then you could imagine a solution like: [{"encounter": { ... }, "results": { ‘12345-2’: { ...vital here...}, ‘1234-3’: {...vital here...} } }...] Where results come back indexed by LOINC-code, which lets app developers access vitals[0].results[‘MY_LOINC_CODE’], for O(1) lookup, but is worthless unless developers know which LOINC codes to look for.

Any other solution will lie somewhere between these extremes. My intuition is that, based on its mission, SMART should care more about perspective 1 more than perspective 2, but that 2 should be a non-zero quantity (70%:30%, say). So a good solution will optimize the experience for app developers using the SMART standard-mapped set of vitals, but provide a (probably inferior) mechanism for representing uncategorized vitals.

The solution proposed thusfar in the thread doesn’t work that way, as it requires developers to iterate over a list of generic ‘results’, and check a ‘@type’ property on them. I think we really DO want developers to access ‘vitals[0].weight’, or at least ‘vitals[0].results.weight’. It seems to me that we might achieve this adding predicates to the RDF graph for mapped vitals where available: see my weak attempt to diagram it here:

possible RDF graph for vitalsigns

Basically, there should always be an sp:vital predicate from VitalSigns objects to individiual vital objects. Additionally, if a vital is mapped (say, weight), add an sp:weight predicate from the VitalSigns object to the weight object. When serializing this to JSON, I see a few options:

Duplicate data, so that a Vitalsigns object contains a ‘raw’ dictionary of vitals, and additional ‘height’ and ‘weight’ attributes that replicate the data (shown in the diagram)
Don’t duplicate data: the ‘raw’ dictionary should only contain unmapped vitals (I don’t like this, for reasons that Josh has pointed out previously)
Use pointers: I think JSON-LD has a mechanism for linking multiple subjects to the same object. The vitalsigns object could then contain a ‘raw’ list of the vitals, and additional ‘height’ and ‘weight’ attributes that are links to the appropriate object in the raw list. This seems clearly optimal, if we’re using JSON-LD.

Any of these options would allow:

Access to mapped vitals with: ‘vitals[0].weight’, O(1) lookup
Access to other vitals by iterating over ‘vitals[0].raw’, O(n) lookup

Which describes the 70:30 balance of perspectives I laid out above.

Pascal wrote:

Good, got the diagram now!

I agree on your argumentation, and although I don't like the duplication -- I would prefer to just have an array of vital objects -- I see how it can make life easier for probably 95% of the devs that use the "standard" vital signs. That also makes the hierarchy one layer less deep from the idea I proposed, which is a good thing.

So having all the vitals in a "raw" (or whatever) container but making a few select vitals available as top-level elements as well seems a good compromise. That however would require we limit the top-level elements to one LOINC code only, which currently is the case and could also be done if we were to add head circumference as another top-level element. If somebody needs fetal head circumference values they'll just have to use the "raw" container.

Does this sound useful? Josh, should I have put this discussion on GitHub? :

Dan wrote:

I think this could easily be extended to have multiple LOINC codes per mapped vital sign. The top-level 'weight' attribute would be a list that points to (or duplicates) all available vitals that fall into the weight bucket.

I.e: { 'raw': [{height, standing}, {height, lying}, {weight}, {head circumference}], 'weight': [{weight}], 'height': [{height, standing}, {height, lying}] }

But that's an interesting case, as I think it's already confusing to app developers. If the purpose of mapping LOINC codes to generics like 'height' is to keep app developers from thinking about the codes, what should a developer trying to get a list of patient heights do if the /vitalsigns/ call returns both a lying and a standing height?

Josh writes:

(I'm still not seeing the diagram.)

There's good stuff here, and Dan I appreciate the explicit fashion in which you've laid out the balance between simple/predictable vs. full-coverage. The current balance isn't quite (100,0) in the following respect: each individual sp:VitalSign can have a LOINC code that further describes it beyond the predicate used to attach it to its sp:VitalSigns. (E.g. the sp:height predicate can link to an sp:VitalSign with any LOINC code that is a type of height).

The "duplication" of sp:VitalSign elements attached by two predicates (e.g. sp:height and sp:vital) is easily expressed in RDF schema by saying that

sp:height rdfs:subPropertyOf sp:vital.

In other words, anytime you see (x sp:height y.) it's also true that (x sp:vital y.) Of course, you don't need a semantic web reasoner simply to list both predicates! In JSON-LD this can indeed be expressed by reference -- which plays well with our current approach of mapping to in-memory JS object graphs, so that we might find (in terms of JS object equivalence) for instance in a given API response that:

vitalSigns[0].vital[2] === vitalSigns[0].height vitalSigns[0].vital[5] === vitalSigns[0].bloodPressure.systolic (!!)

...

This last example raises an important point about why we model vitalSigns explicitly: they're not all simply a LOINC code and a valueAndUnit. Blood Pressures are a good example of larger structure (systolic, diastolic, patient position... and the patient's position at least isn't represented in LOINC.)

So if we do tilt the balance further towards (70,30), the 30% only covers the flat-structured stuff (which is fine!)

thisisdhaas commented 12 years ago

I added the diagram to my comment above--no more trusting suspiciously easy-to-use diagramming startups!

Good point on subclassing predicates. Perhaps we could keep around the idea of the @type parameter to help resolve your other point about explicit modeling:

{
  'bloodPressure': ref:{blood pressure obj id 1},
  'height': ref:{height obj id 2},
  'raw': [
      { '@type': 'sp:BloodPressure',
        'data': {blood pressure obj id 1} },
      { '@type': 'sp:Height',
        'data': {height obj id 2} },
      { '@type': 'sp:Vital',
        'data': {head circumference obj id 3} },
  ]
}

So you can get at height with vitalSigns[0].height OR vitalSigns[0].raw[1].data. If iterating over the contents of vitalSigns[0].raw, you could look at the @type attribute to see the type of the data, then pull structured info out of the data attribute. This would allow us to put explicitly modeled types in the raw data section, and as new types are defined, the model would extend naturally.

Note that there is an assumption here (as in the current model) that vitalSigns[0].bloodPressure points to a BloodPressure object and vitalSigns[0].height points to a Height object.

jmandel commented 12 years ago

(I'd had better luck with diagram.ly in the past.)

Hey Dan,

Starting form your most recent JSON snippet: you don't really need to wrap each object in a raw pointer; you could instead assign it multiple type directly. In other words, each vital gets a generic type (sp:Vital) + generic predicate (sp:raw), as well as an optional friendly type (sp:Height) and friendly predicate (sp:height). The relations in RDFS are:

sp:height rdfs:subPropertypOf sp:raw.
sp:Height rdfs:subClassOf sp:Vital.

And we'd be talking about a graph like:

{
  'bloodPressure': ref:{blood pressure obj id 1},
  'height': ref:{height obj id 2},
  'raw': [
    ref:{height obj id 2},
    ref:{weight obj id 3}

    // NOT sure this works well: ref:{blood pressure obj id 1}
    //might rather:
    //  ref:{systolic obj id 1a},
    //  ref:{diastolic obj id 1b}
  ],

  {height obj id 2}: {
     type: [Height, Vital],
     valueAndUnit: {
       value: 1.5,
       unit: "m"
    }    
  },
  {weight obj id 3}: {
     type: [Weight, Vital],
     valueAndUnit: {
       value: 65,
       unit: "kg"
    }    
 }
}

I think this is definitely useful for "flat" vitals. But a concern about how this extends: a 'raw' bucket is only really useful to the extent that developers can locate something consistent when they look there: e.g. the contract could be "all raw vitals will consist of a loinc code, a value, and a unit (and a nice type too, sure)".

If instead we say that even the "raw" elements can have their own (ad-hoc) substructure, then there's no real point in relegating them some "raw corner" of the graph; we might as well just say "go ahead and directly attach any structure you want to the graph" (which of course is exactly what RDF lets you do). In other words, if there's value to an explicit escape hatch, I think the value is that

we might as well just include them directly in the graph (in an ad-hoc, non-prespecifed way) with nice-looking predicates.

thisisdhaas commented 12 years ago

Maybe my lack of familiarity with RDF / JSON-LD is interfering with my ability to express myself :-).

Is the JSON structure you just described actually creating a 'raw corner' of the graph? If sp:raw is a superclass for sp:height, etc., then by adding generic vitals (predicate sp:raw, type sp:Vital), aren't we just 'attaching structures to the VitalSigns graph?'

we might as well just include them directly in the graph (in an ad-hoc, non-prespecifed way) with nice-looking predicates.

I think this is what I'm asking for. Extensibility comes for free when the SMART standard defines new subclasses of sp:raw or sp:Vital.

Perhaps the following graph will address your concerns about flat-data only in the escape hatch:

{
  'bloodPressure': {
    'systolic': ref:{systolic obj id 1},
    'diastolic': ref:{diastolic obj id 2},
    'position': { position coded object here... },
     ... other bp fields ...
    },
  'height': ref:{height obj id 3},
  'weight': ref:{weight obj id 4},
  'raw': [
    ref:{systolic obj id 1},
    ref:{diastolic obj id 2},
    ref:{height obj id 3},
    ref:{weight obj id 4},
    ref:{head circumference obj id 5}
  ],

  {systolic obj id 1}: {
     type: [Vital],
     valueAndUnit: {
       value: 120,
       unit: "mm[Hg]"
    }    
  },
  {weight obj id 3}: {
     type: [Weight, Vital],
     valueAndUnit: {
       value: 65,
       unit: "kg"
    }    
  },
  {head circumference obj id 5}: {
     type: [Vital],
     valueAndUnit: {
       value: 12,
       unit: "cm"
    }    
  },
    ... other objects ...
}

Now the graph can be accessed the same as previously, but raw (and consistently formatted) VitalSign objects can be accessed in VitalSigns[0].raw, which allows you to add LOINC codes/values/units for vitals like head circumference which the spec doesn't yet model.

jmandel commented 12 years ago

Maybe not a 'corner'...

Sure, calling it the raw "corner" is probably a bit too strong. But the basic point I'm trying to make is that while sp:systolic and sp:diastolic and sp:weight are plausibly subProperties of sp:vital, I'm not sure that sp:bloodPressure is. At least, not if this designation is meant to imply a consistent data shape within...

Your example is spot-on

Re: your json example: Yes! This is exactly the structure I was trying to describe above when I suggested we "might rather" just point to flat (leafy) bits in the "raw" graph.

(Of course container and app developers might want to explore better-structured diversions from our existing model, and that's fine to do in an ad-hoc way. Indeed, that's a great way for the API to progress over time. But I think the focus of this discussion is: short of saying "just write whatever you like in RDF", how can we describe a middle ground that provides some structure for exposing data that the SMART ontology doesn't describe every last detail of.)

Jotting down RDF

It's probably easiest to "talk" precisely in these discussions using the turtle syntax. For example, your diagramly figure boils down to:

[] a :VitalSigns;
  :vital _:1, _:2, _:3;
  :weight _:1;
  :height _:2.

_:1 a :Weight, :Vital;
  :vitalName [:code <http://purl.bioontology.org/ontology/LNC/123-2>].

_:2 a :Height, :Vital;
  :vitalName [:code <http://purl.bioontology.org/ontology/LNC/123-3>].

_:3 a :Vital;
  :vitalName [:code <http://purl.bioontology.org/ontology/LNC/123-5>].

(Note that [] introduce blank nodes, not lists :-))

arjunsanyal commented 12 years ago

It feels like we are coming to consensus on this. What are the outstanding issues from your perspectives?

jmandel commented 12 years ago

I think we should give this a shot in our sample patient generator, in a branch, and try it out :-)

On Wed, Jun 27, 2012 at 8:33 AM, Arjun Sanyal < reply@reply.github.com

wrote:

It feels like we are coming to consensus on this. What are the outstanding issues from your perspectives?

Reply to this email directly or view it on GitHub:

https://github.com/chb/smart_project_management/issues/3#issuecomment-6606701

thisisdhaas commented 12 years ago

The only issue I haven't seen discussed is the point raised by @p2 about handling VitalSigns objects with multiple vitals that map to the same type (i.e. height standing and height lying). Should we attempt to account for such cases? What should VitalSigns[0].height point to if height standing and height lying were taken?

jmandel commented 12 years ago

If we make the cardinality of sp:height (and the others) 0..*, then we could just attach both the standing and the lying with a height predicate. Thoughts?

On Wed, Jun 27, 2012 at 10:03 AM, Daniel Haas < reply@reply.github.com

wrote:

The only issue I haven't seen discussed is the point raised by @p2 about handling VitalSigns objects with multiple vitals that map to the same type (i.e. height standing and height lying). Should we attempt to account for such cases? What should VitalSigns[0].height point to if height standing and height lying were taken?

Reply to this email directly or view it on GitHub:

https://github.com/chb/smart_project_management/issues/3#issuecomment-6609103

thisisdhaas commented 12 years ago

Does this complicate the JSON representation? With cardinality > 1, it will need to be represented as a list. This means that app developers can no longer look at VitalSigns[0].height, but will need to (at least) look at VitalSigns[0].heights[0]. This somewhat breaks the abstraction barrier: app developers won't know the difference between entries in the heights array without understanding the different LOINC codes.

I can't really come up with a better approach, however. Maybe VitalSigns[0].height should just point to the first of the heights? That would be consistent with the current philosophy of: 'use the simplest structure if you don't understand LOINC, poke through the raw vitals otherwise'.

p2 commented 12 years ago

I agree with Dan's first point, having multiple heights makes the graph less developer friendly again, which was the key idea behind it (at least as I understood).

I think it should be limited to one item, maybe by having a 1-to-1 mapping of LOINC code to the predicate? Meaning "height" would always mean LOINC code for standing height; if you want lying height you'll have to query the raw array. On the downside, if a hospital only has lying height (maybe a children's hospital), you could never use the "easy" path and always have to query the raw array. But it guarantees the kind of data you get even when accessing it through the "easy" way.

jmandel commented 12 years ago

Then again, there's always the approach of one predicate per loinc :-) We're most of the way there already...

CCDA (part of the MU Stage 2 proposal) uses the following codes for vitals (consistent with C32):

Table 268: Vital Sign Result Type Value Set Value Set: HITSP Vital Sign Result Type 2.16.840.1.113883.3.88.12.80.62 DYNAMIC Code System(s): LOINC 2.16.840.1.113883.6.1 Description: This identifies the vital sign result type Code Code System Print Name 9279-1 LOINC Respiratory Rate 8867-4 LOINC Heart Rate 2710-2 LOINC O2 % BldC Oximetry 8480-6 LOINC BP Systolic 8462-4 LOINC BP Diastolic 8310-5 LOINC Body Temperature 8302-2 LOINC Height 8306-3 LOINC Height (Lying) 8287-5 LOINC Head Circumference 3141-9 LOINC Weight Measured 39156-5 LOINC BMI (Body Mass Index) 3140-1 LOINC BSA (Body Surface Area)

On Wed, Jun 27, 2012 at 10:22 AM, Pascal Pfiffner < reply@reply.github.com

wrote:

I agree with Dan's first point, having multiple heights makes the graph less developer friendly again, which was the key idea behind it (at least as I understood).

I think it should be limited to one item, maybe by having a 1-to-1 mapping of LOINC code to the predicate? Meaning "height" would always mean LOINC code for standing height; if you want lying height you'll have to query the raw array. On the downside, if a hospital only has lying height (maybe a children's hospital), you could never use the "easy" path and always have to query the raw array. But it guarantees the kind of data you get even when accessing it through the "easy" way.

Reply to this email directly or view it on GitHub:

https://github.com/chb/smart_project_management/issues/3#issuecomment-6609653

jmandel commented 12 years ago

Broken out into individual issues; closing.

smart-classic / smart-issues