w3c / hcls-fhir-rdf

Sketching out an RDF representation for FHIR
39 stars 15 forks source link

How best to represent DICOM lists with missing elements, in FHIR RDF? #141

Open dbooth-boston opened 2 months ago

dbooth-boston commented 2 months ago

Copying this issue description from Erich Bremer's email.

From: Erich Bremer Date: Mon, 25 Mar 2024 11:22:24 -0400

Let me re-state the two problems that I feel need to be resolved as far as I see in having a DICOM RDF for those not part of the original conversation.

For reference, here is a snippet of DICOM compliant JSON: { "00020002": { "vr": "UI", "Value": [ "1.2.840.10008.5.1.4.1.1.12.1"]}, "00020003": {"vr": "UI", "Value": ["1.3.12.2.1107.5.4.3.321890.19960124.162922.29"]}, "00020010": { "vr": "UI", "Value": ["1.2.840.10008.1.2.4.50"]}, "00020012": { "vr": "DS", "Value": [ "999.999"]}, ...

1) DICOM requires the "vr" property: https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.3.html 2) DICOM handle value multiplicity by putting all values in ordered arrays: https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.4.html 3) If an attribute is present but the missing (value length is 0), DICOM says leave off the "value" property but the rest must be there. https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_F.2.5.html 4) For null entries in the arrays, DICOM says put null for that list element as the position itself may/may not be important.

Now to RDF ** a) It's easy to handle 1+2+3 with RDF using a blank nodes and RDF Lists:

[ dcm:00020002 [ dcm:vr: "UI", dcm:Value ("1.2.840.10008.5.1.4.1.1.12.1") ]; dcm:00020003 [ dcm:vr "UI", dcm:Value ("1.3.12.2.1107.5.4.3.321890.19960124.162922.29")]; dcm:00020010 [ dcm:vr: "UI", dcm:Value ("1.2.840.10008.1.2.4.50")]; dcm:00020012 [ dcm:vr: "DS", dcm:Value ( "999.999")]; ...

The problem arises with 4 - nulls

For 3, we leave off the dcm:Value triple as "null maps to no triple". The problem is in the RDF Lists.

( "1" "2" "3") is short-hand for: _:myList rdf:first "1" ; rdf:rest [ rdf:first "2" ; rdf:rest [ rdf:first "3" ; rdf:rest rdf:nil ] ] .

Following "no triple asserted is null", if I wanted to leave the second element out of the list, I would just remove rdf:first "2". No triplestore that I know would complain. I can SPARQL using the long list version and I can write the SPARQL with an optional {_:SecondPosition rdf:first ?second } or even a minus {}. In this fashion, the positional information is preserved as needed by elements like https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.11.7.html#table_C.8-74f or pixel spacing https://dicom.innolitics.com/ciods/rt-dose/image-plane/00280030 and would return an unbound value depending on the data.

This path falls apart as there is no support (an implicit rdf:first must always be present) for the missing rdf:first triple in the second position for the shorthand ( "1" "2" "3"). I can put a variable ( "1" ?second "3", but I cannot say the equivalent of ( "1" optional { ?second} "3") or even ( "1" minus { ?second} "3") JSON-LD will simply remove the second element if I say ( "1" null "3") with the thought that null is no triple asserted. But this stance is not the same as removing the rdf:first triple, it's removing multiple triples [ rdf:first "2" ; rdf:rest[] ] and then pointing :first to :third which takes out the positional information and changing the meaning of things as DICOM views their data. RDF List is a container construct with triples that express the various positions and relations and the associated positional values. What would be wrong (who would be put out) if we allow something that is already allowable in the RDF model? Honestly, I feel like RDF is not following its own rules here and needs to be fixed even with DICOM out of the picture. The same, I feel, applies to RDF Sequence containers:

ex:mySequence rdf:type rdf:Seq ; rdf:_1 "1" ; rdf:_2 "2" ; rdf:_3 "3" .

If I omitted rdf:_2, it should just now be: ex:mySequence rdf:type rdf:Seq ; rdf:_1 "1" ; rdf:_3 "3" .

If the current logic of RDF lists is applied it would become the below which is not the same: ex:mySequence rdf:type rdf:Seq ; rdf:_1 "1" ; rdf:_2 "3" .

Keeping the positional triples in RDF Lists and sequences seems to be an easier fix than introducing some type of null literal or typed "null"^^xsd:integer but it seems to be an implicit taboo due to the lack of the support in the syntactic sugar for lists and how JSON-LD compaction behaves with nulls. I appreciate and I think I understand the thinking of the elements as part of the schema, but it seems to be heading to more complex territory. RDF would allow us to deviate further from the DICOM JSON through using things like custom data types to reduce the number of triples: [ dcm:00020002 ("1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI); dcm:00020003 ("1.3.12.2.1107.5.4.3.321890.19960124.162922.29" ^^dcm:UI ); dcm:00020010 ("1.2.840.10008.1.2.4.50" ^^dcm:UI ); dcm:00020012 ( "999.999"^^dcm:DS); ... and even remove lists where DICOM value multiplicity is always 1 but it makes things a bit more complicated. I think a DICOM RDF needs to be very much bi-directional and keeping towards the DICOM JSON modeling makes it more familiar for the people in the DICOM domain. It reduces the tooling on both sides. Nothing stops a RDF person from making SPARQL update transforms to mutate the data back and forth to a different design (perhaps more performant) but I fear great becomes the enemy of the good. "a little semantics goes a long way" - Erich

dbooth-boston commented 2 months ago

To help drive discussion, I'm listing here some options, with pros/cons. Are there others I should add?

Option 1: Omit rdf:first elements from an RDF list ladder

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

       [ rdf:first "bar" ;
         rdf:rest [  # Note no rdf:first here
                    rdf:rest [ rdf:first "foo" ;
                               rdf:rest rdf:nil
                             ]
                  ] .
       ]

PROS:

CONS:

Option 2: Use an RDF Sequence with a missing qlement

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

    rdf:type rdf:Seq ;
    rdf:_1 "1" ;
    ###  Note no rdf:_2 triple here
    rdf:_3 "3" .

PROS:

CONS:

Option 3: Use a distinguished fhir:null value to represent null.

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

( "bar" fhir:null "foo" )

PROS:

CONS:

Option 4: Use explicit fhir:indexes , like in FHIR RDF R4

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

... [  [ fhir:v "bar ; fhir:index 0 ] ;
       ### Note no element with fhir:index 1
       [ fhir:v "foo ; fhir:index 2 ] ;

PROS:

CONS:

JervenBolleman commented 2 months ago

Option 5: Use a blank node that is a of fhir:null value to represent null.

This DICOM JSON:

"Value": [ "bar", null, "foo" ]

would be this in Turtle:

( "bar" [ a fhir:null ] "foo" )

PROS:

Compact
Concise Turtle list syntax can be used.
Allow nothing why the thing is null.

CONS:

If _myList is expected to be all integers or floats, having a blank node fhir:null value in the list might cause processing difficulties.

Option 5 example [edit by ericP]

<>
    dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a fhir:null ] "SINGLE A" )] ;
    dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

Option 5 expanded with type hierarchy for null flavors [edit by ericP]

PREFIX dinull: <https://dicom.nema.org/MEDICAL/Dicom/current/output/chtml/part20/sect_5.3.2.html>
dinull:UNK rdfs:label "Unknown. A proper value is applicable, but is not known." .
dinull:ASKU rdfs:label "Asked, but not known. Information was sought, but not found (e.g., the patient was asked but did not know)." ;
  rdfs:subClassOf dinull:UNK .

<>
    dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a dcm:UNK ] "SINGLE A" )] ;
    dcm:00012345 [ dcm:vr "CS"; dcm:Value ( 1 2 [ a dcm:UNK ] 4 [ a dcm:ASKU ] 6 )] ;
    dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .

Option 5a example [edit by DBooth]

<>
    dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a dicom:null ] "SINGLE A" )] ;
    dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .
ebremer commented 2 months ago

Perhaps something more generic than fhir:null like "rdf:null" as the issue seems to be a general RDF List issue than a FHIR issue?

JervenBolleman commented 2 months ago

Perhaps something more generic than fhir:null like "rdf:null" as the issue seems to be a general RDF List issue than a FHIR issue?

I don't think so. The question is what is the meaning of null in dicom and is that meaning consistent. I think here in the lists a null value is an existential variable. A variable that might be known in a different graph and/or inferable. So a blank node is the way to go.

The null issue outside of lists of values is larger.

I think looking at dicom null flavours we have a mix here. dicom:NA is an owl:Nothing like value ( Which IMHO means that triple should not be there at all)

If we are talking in part about null flavors we can do the Option 5 and capture this and allow for reasoning and graph merging to resolve this.

ebremer commented 2 months ago

Option 6: Just use a plain blank node

This DICOM JSON:

"Value": [ 1, null, 2 ]

would be this in Turtle:

( 1 [] 2 )

Pros

Simple.
Concise Turtle list syntax can be used.
Usable outside the world of DICOM/FHIR

Cons

JSON-LD compaction will change all literals in the list to blank nodes (with their values and such) when the blank node(s) are detected
ebremer commented 2 months ago

@JervenBolleman we discussed the DICOM Null flavors during the FHIR RDF call today. @ericprud had found and shared that link the other month at an earlier meeting. We discussed if a generalized approach could be used as the missing values in a rdf:List has use cases outside of DICOM. My original approach was to just omit an rdf:first when no value is known, but @dbooth-boston explained to me is that this is considered to not be "well-formed" and could cause issues. I lean towards a generalized (non-FHIR, non-DICOM) solution as JSON-LD compaction would not be changed for anything less than a general solution and to handle use cases outside of FHIR/DICOM. I do like having something that could indicate the "null flavor" for a FHIR/DICOM specific solution, it adds detail and clarity.

My first thought was to have the DICOM RDF / JSON-LD match the current DICOM JSON save the context to make acceptance and tooling easier, but, nulls in literal lists is problematic.

ebremer commented 1 month ago

If we followed DICOM XML rather than the DICOM JSON (giving up on a DICOM JSON / JSON-LD match)

DICOM XML (fragment)

    <DicomAttribute keyword="ImageType" tag="00080008" vr="CS">
        <Value number="1">DERIVED</Value>
        <Value number="2">PRIMARY</Value>
        <Value number="3">SINGLE PLANE</Value>
        <Value number="4">SINGLE A</Value>
    </DicomAttribute>
    <DicomAttribute keyword="SOPClassUID" tag="00080016" vr="UI">
        <Value number="1">1.2.840.10008.5.1.4.1.1.12.1</Value>
    </DicomAttribute>
    <DicomAttribute keyword="SOPInstanceUID" tag="00080018" vr="UI">
        <Value number="1">1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6</Value>
    </DicomAttribute>
    <DicomAttribute keyword="StudyDate" tag="00080020" vr="DA">
        <Value number="1">19970422</Value>
    </DicomAttribute>
    <DicomAttribute keyword="StudyTime" tag="00080030" vr="TM">
        <Value number="1">131047</Value>
    </DicomAttribute>

Option 7

RDF Turtle

<>
    dcm:DicomAttribute [ dcm:keyword "ImageType"; dcm:tag "00080008"; dcm:vr "CS";
        dcm:Value [ dcm:number 1; rdf:value "DERIVED" ];
        dcm:Value [ dcm:number 2; rdf:value "PRIMARY" ];
        dcm:Value [ dcm:number 3; rdf:value "SINGLE PLANE" ];
        dcm:Value [ dcm:number 4; rdf:value "SINGLE A" ]] ;

    dcm:DicomAttribute [ dcm:keyword "SOPClassUID"; dcm:tag "00080016"; dcm:vr "UI";
        dcm:Value [ dcm:number 1; rdf:value "1.2.840.10008.5.1.4.1.1.12.1" ]] ;

    dcm:DicomAttribute [ keyword "SOPInstanceUID"; dcm:tag "00080018"; dcm:vr "UI";
        dcm:Value [ dcm:number 1; rdf:value "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6" ]] ;

    dcm:DicomAttribute [ dcm:keyword "StudyDate"; dcm:tag "00080020"; dcm:vr "DA";
        dcm:Value [ dcm:number 1; rdf:value "19970422" ]] ;

    dcm:DicomAttribute [ dcm:keyword "StudyTime"; dcm:tag "00080030"; dcm:vr "TM";
        dcm:Value [ dcm:number 1; rdf:value "131047" ]] ;

Option 8

reduced scaffolding ( use rdf:List, eliminate dcm:number, and move keyword and tag string values to ontology )

<>
    dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" dcm:UNK "SINGLE A" )] ;
    dcm:00080016 [ dcm:vr "UI"; dcm:Value ( "1.2.840.10008.5.1.4.1.1.12.1" )] ;
    dcm:00080018 [ dcm:vr "UI"; dcm:Value ( "1.3.12.2.1107.5.4.3.11…030.6" )] ;
    dcm:00080020 [ dcm:vr "DA"; dcm:Value ( "19970422" )] ;
    dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] ;
ebremer commented 1 month ago

And in my "maybe" example:

Option 9a

<>
    dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS "UNK"^^dcm:nullFlavor "SINGLE A"^^dcm:CS ) ;
    dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
    dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
    dcm:00080020 ( "19970422"^^dcm:DA ) ;
    dcm:00080030 ( "131047"^^dcm:TM ) ;

or possibly

option 9b

<>
    dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS dcm:UNK "SINGLE A"^^dcm:CS ) ;
    dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
    dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
    dcm:00080020 ( "19970422"^^dcm:DA ) ;
    dcm:00080030 ( "131047"^^dcm:TM ) ;

option 9c (with EricP nulls)

<>
    dcm:00080008 ( "DERIVED"^^dcm:CS "PRIMARY"^^dcm:CS  [ a dcm:UNK ] "SINGLE A"^^dcm:CS ) ;
    dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1"^^dcm:UI ) ;
    dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6"^^dcm:UI ) ;
    dcm:00080020 ( "19970422"^^dcm:DA ) ;
    dcm:00080030 ( "131047"^^dcm:TM ) ;
dbooth-boston commented 3 weeks ago

I'd like to get this decided on our next teleconference (this week). Last week's discussion seemed to favor Option 5, but are there particular variants of option 5 that we should consider? For example, should we be recommending one specific null type? If so, specifically what URI? Or should we recommend a null-flavor type hierarchy? If so, specifically what, and what should the top-level null class be? If we can make options as concrete as possible, it will help facilitate our decision-making.

ebremer commented 3 weeks ago

@dbooth-boston of all of the DICOM null flavors, "NI" described in the spec as "No information. This is the most general and default null flavor." seems to be a good candidate for the root null flavor value with NA, UNK, ASKU, NAV, NASK, MSK, OTH being subclasses of "NI". I like Option 5 (@ericprud edits), as well, as it's closest to DICOM JSON. I also like Option 9 (with @ericprud nulls) being fairly compact. Further, the DICOM string usage could be mapped safely to more performant datatypes with SHEX/SHACL used to enforce the range/structure of values. A mapping/scripts from 5 to 9 (and 9 to 5) for those who want a few less triples?

gaurav commented 3 weeks ago

Just wanted to add a link to the HL7 NullFlavor code system, which is where the DICOM null flavor descriptions come from I think: https://terminology.hl7.org/CodeSystem-v3-NullFlavor.html

gaurav commented 3 weeks ago

Proposed 5B:

<>
    dcm:00080008 [ dcm:vr "CS"; dcm:Value ( "DERIVED" "PRIMARY" [ a fhir:null; fhir:nullFlavor "UNK" ] "SINGLE A" )] ;
    dcm:00080030 [ dcm:vr "TM"; dcm:Value ( "131047" )] .
ericprud commented 3 weeks ago

Per today's provocative, last-second assertion that subtyping xsd:string won't reduce utility of SPARQL operator semantics, here are the relevant operators:

Operator Type(A) Type(B) Function Result type
A = B xsd:string xsd:string op:numeric-equal(fn:compare(STR(A), STR(B)), 0) xsd:boolean
A != B xsd:string xsd:string fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), 0)) xsd:boolean
A < B xsd:string xsd:string op:numeric-equal(fn:compare(STR(A), STR(B)), -1) xsd:boolean
A > B xsd:string xsd:string op:numeric-equal(fn:compare(STR(A), STR(B)), 1) xsd:boolean
A <= B xsd:string xsd:string fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), 1)) xsd:boolean
A >= B xsd:string xsd:string fn:not(op:numeric-equal(fn:compare(STR(A), STR(B)), -1)) xsd:boolean

-- SPARQL Operator Mapping

So refining an xsd:string to be e.g. "018M"dicom:AgeString would mean you couldn't just rely on the default interpretation of ""s as an xsd:string, e.g.:

opt 5

?s dcm:01234567 [ dcm:Value ?age ]
FILTER (?age = "018M")

opt 9 /c specialized types:

?s dcm:01234567 ?age
FILTER (?age = "018M"dicom:AgeString)
ebremer commented 2 weeks ago

Option 10 ( dcm:null bnode, XSD data types, VR types moved to ontology/SHEX/SHACL, moving Lists up to their properties)

<>
    dcm:00080008 ( "DERIVED" "PRIMARY"  [ a dcm:null ] "SINGLE A" ) ;
    dcm:00080016 ( "1.2.840.10008.5.1.4.1.1.12.1") ;
    dcm:00080018 ( "1.3.12.2.1107.5.4.3.11540117440512.19970422.140030.6" ) ;
    dcm:00080020 ( "19970422"^^xsd:date ) ;
    dcm:00080030 ( "131047"^^xsd:time ) ;