w3c / hcls-fhir-rdf

Sketching out an RDF representation for FHIR
39 stars 15 forks source link

Significant digits in decimal numbers #92

Closed dbooth-boston closed 2 years ago

dbooth-boston commented 2 years ago

FHIR requires that significant digits be retained in decimal numbers, to indicate the precision of the number. For example, "1.000" versus "1.0". But standard JSON does not distinguish 1.000 from 1.0 (as numbers). Consequently, FHIR JSON requires a custom JSON parser.

How should the significant digits (precision) be retained and indicated in FHIR RDF?

ericprud commented 2 years ago

RDF preserves differences in lexical representations of the same value (c.f. literal equality). SPARQL and Turtle also preserve them, so I think our only problem is finding a JSON parser that preserves lexical forms rather than parsing them to IEEE754. I believe json.net does this, but we need something for js and probably java.

related rant: http://community.fhir.org/t/why-did-you-break-the-international-json-standard/1367/11

gaurav commented 2 years ago

I think xsd:decimal is intended for this use case -- although it is matched equivalently without conversion (i.e. "15"^^xsd:byte and "15.0"^^xsd:decimal as per XML Schema Datatypes in RDF and OWL), I think RDF systems are supposed to maintain the correct number of significant digits since it is a transferred as a verbatim string.

An alternative would be something like xsd:precisionDecimal, which is based on IEEE754. I am not a computer science, but this seems to be similar to how BigDecimal works.

In either case, we could store the value in JSON-LD as value objects, e.g. { "@value": "15.0", "@type": "xsd:decimal" }.

ericprud commented 2 years ago

(switching to 15.00 because 15.0 is the canonical form)

I don't think this changes the RDF representation; we just need to warn folks that they might need to use a non-standard JSON parser when converting FHIR JSON to FHIR RDF (just as they would for converting to FHIR XML, or for that matter, parsing FHIR JSON CDS or anything else that cares about those ghostly significant digits that disappear in IEEE754).

gaurav commented 2 years ago

After discussing this at the FHIR RDF weekly phone meeting on Thursday, Oct 7, I think we concluded that:

  1. As long as FHIR JSON insists that decimals are represented as JSON numbers, our ability to translate them into strings via JSON-LD contexts is limited. Changing this is noted to be "impossible under our procedural rules".
  2. The JSON standard "allows implementations to set limits on the range and precision of numbers accepted", so the main thing to do is to ensure that users of our JSON-LD contexts use a JSON parser that treats numbers as decimals rather than floats.
  3. For any software we write that converts JSON-LD into n-Quads or other RDF representations, we can ensure that the software uses a JSON parser that uses decimals rather than floats, and use xsd:decimal or xsd:precisionDecimal to ensure that the decimal values are not misinterpreted downstream.

Does that sound right to everybody? If so, I think we can close this issue once we add a warning about point 2 to our documentation somewhere.

ericprud commented 2 years ago

Re 1, totally agree.

Re 2, I'd guess that requirements for preserving trailing 0s are rarely, if ever, clinical. We cann concoct a scenario where a CDS system changed its recommendation depending on the result of some precision of some calculation (below), but it's not a routine use of FHIR. Any reason we would have to suggest particular behavior pertains to all users of FHIR JSON and should go into the JSON precision warning. We can link to it to raise awareness.

Re 3, I expect Graham et al would suggest we use the same datatypes as FHIR/XML.

CDS scenario - renal monitoring:

  1. An EGFR < 45.0 triggers 2 visits/year
  2. Patient 41yo female with serum cystatin C of 1.3mg/L.
  3. A reported serum creatinine of 1.50 -> EGRR of 45, so follow-up 1/year
  4. A reported serum creatinine of 1.51 -> EGRR of 44, so follow-up 2/year
  5. A reported serum creatinine of 1.5 in some highly conservative CDS system makes a worst-case assumption of 1.54, triggering 2 follow-ups/year (different behavior than for 1.50).
dbooth-boston commented 2 years ago

Re 3, I expect Graham et al would suggest we use the same datatypes as FHIR/XML.

Can you clarify? Would that be the same or different from what @gaurav suggested above?

ericprud commented 2 years ago

XML is like RDF in that the lexical values are preserved. (XSD has canonical forms, which would alter the number of precision digits, but standard processing doesn't invoke canonicalization, in either XML or RDF.) Given that, and the fact that we'd have to discover or invent new analogs to decimal and doubles to specifically advertise that same behavior, I don't think we can justify deviating from what already works in FHIR/XML.

dbooth-boston commented 2 years ago

From minutes of 10/14/21:

AGREED: All agreed. Close issue 92 with no change to R4.