w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
124 stars 19 forks source link

Support for nulls in RDF Lists to preserve positioning in the list when needed #196

Open ebremer opened 8 months ago

ebremer commented 8 months ago

Why?

I have data to be modeled that generally fits within an ordered list structure, but some elements may be missing, but the position in the list must be held. In other words, position 2 in a list has a specific meaning. For example, in DICOM the Pixel Spacing attribute, the first position is row spacing and the second position is column spacing. One of the two values could be missing. see (https://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_10.7.html#sect_10.7.1.3)

Proposed solution

I can define a list in Turtle as:

:myList rdf:first 1; rdf:rest :element2 . :element2 rdf:first 2 rdf:rest :element3 . _:element3 rdf:first 3; rdf:rest rdf:nil .

In SPARQL, I can express more concisely this list as (1 2 3 )

I can omit the triple :element2 rdf:first 2 thus creating a "null" for the value of the second item, but I cannot say in SPARQL (1 null 3 ). The only thing I can do is use an expanded SPARQL query and wrap a optional (or a minus) around the second element value: optional { :element2 rdf:first ?value2 }

It would be helpful to add a way to indicate that a list position is unbound such as (1 optional{ ?position2 } 3). In some cases, it would be desirable to have (1 minus { ?position2) 3) to match any list that is missing position 2. Included in any mutation function (such as those mentioned in #151 , the ability to set a positional element to null. For example, setValue(?myList,null,2) to remove an element in a list but preserve the positioning.

Considerations for backward compatibility

JSON-LD currently filters out RDF List elements that do not have a rdf:first value

VladimirAlexiev commented 4 months ago

nil in SPARQL is called UNDEF. But that's allowed only in VALUES: https://github.com/w3c/sparql-dev/issues/62 . So this is a legitimate case.

However, handling attributes by their position in a list is a bad practice since the information which position means which attribute is implicit:

dicom:pixelSpacing (100 200)

It's better to make the information explicit:

dicom:pixelSpacing [a dicom:PixelSpacing; dicom:rowSpacing 100; dicom:columnSpacing 200]

IfcOwl (used in AECO) uses the same bad pattern: coordinates x,y,z are captured in rdf:List: and not even of numbers, but of parasitic nodes like ifc:ExpressReal (which themselves carry the number).

ericprud commented 3 months ago

nil in SPARQL is called UNDEF. But that's allowed only in VALUES: #62 . So this is a legitimate case.

Underscoring (without specifically endorsing) the implicit proposal to use UNDEF in e.g. Turtle:

<SensorArray1> <values> (12.34 10.82 UNDEF 11.9) .

However, handling attributes by their position in a list is a bad practice since the information which position means which attribute is implicit:

dicom:pixelSpacing (100 200)

It's better to make the information explicit:

dicom:pixelSpacing [a dicom:PixelSpacing; dicom:rowSpacing 100; dicom:columnSpacing 200]

IfcOwl (used in AECO) uses the same bad pattern: coordinates x,y,z are captured in rdf:List: and not even of numbers, but of parasitic nodes like ifc:ExpressReal (which themselves carry the number).

Sure, there are cases where folks use positional semantics that could have a semantically-explicit property, but there are also just plain arrays of data. Plus, I think that depriving folks of a way to express a subset of lists isn't a good way to encourage good modeling hygiene.

ebremer commented 3 months ago

@VladimirAlexiev, DICOM makes heavy use of ordered lists/arrays and positional semantics and is unlikely to change anytime soon. UNDEF allowed in an RDF List would facilitate a crosswalk between DICOM and RDF as it would match it's design as is. Perhaps even mapped to a null in JSON-LD when specifying a List structure.

ericprud commented 3 months ago

The DICOM RDF group wrestled with the tricky prob of what RDF graph this matches.

BNode with sentinel type

Because we were not being terribly bold, we came up with a DICOM-specific dialect involving BNodes with a sentinae type: [ a dicom:null]. Following this to it's logical conclusion, we have pathological diffusion (proliferation of RDF terms with identical semantics). The SPARQL WG could own the null term type. (Having a sentinel term directly in the rdf:first enables unfortunate unification and makes queries and rules harder 'cause the e.g. "how many people in the room have the same birthday?" query would frequently have to be amended with "and that birthday isn't rdf:null".)

No rdf:firstšŸ”—

RDF currently requires a first and rest on every list element. The SPARQL 1.1 WG lead the way to RDF 1.1 eliminating untyped literals (i.e. they became xsd:strings). Do we again want to be so bold as to propose a variant of the first/rest ladder which has no rdf:first?

<SensorArray1> <values> (12.34 10.82 UNDEF 11.9) .

=>

<SensorArray1> <values> [
  rdf:first 12.34; rdf:rest [
    rdf:first 10.82; rdf:rest [
      rdf:rest [ #  UNDEF has no rdf:first
        rdf:first 11.9; rdf:rest rdf:nil
  ] ] ] ] .

This is bold; also useful, and less bold than adding lists as 1st class objects in RDF. It also can't be misinterpreted by naive agents (unlike sentinel types or e.g. adding a type arc _:b3 a rdf:null-so-ignore-my-first ; rdf:first []; rdf:rest _:b4.)

Process note: this should probably pass by the semantic-web@w3.org list if it isn't quickly and conclusively shot down here.