shexSpec / shex

ShEx language issues, including new features for e.g. ShEx2.1
23 stars 8 forks source link

Support for RDF Collections (lists) #17

Open gkellogg opened 7 years ago

gkellogg commented 7 years ago

In doing the RDFS for ShEx, it was necessary to separate the properties expression and expressions, as expressions is expected to take a list of TripleExpression, while expression takes a single value. Furthermore, expressions must have at least two elements. How would we create a shape to validate this?

Certainly, one way is to use the rdf:first and rdf:rest properties:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX shex: <https://shexspec.github.io/ns/>
PREFIX ex: <http://schema.example/>

shex:EachOf CLOSED {
  rdf:type shex:EachOf;
  shex:expressions BNODE @ex:ListOfTwoExpressions;
}

ex:ListOfTwoExpressions CLOSED {
  rdf:type rdf:List?;
  rdf:first @shex:TripleExpression;
  rdf:rest {
    rdf:first @shex:TripleExpression;
    rdf:rest @ex:ListOfExpressions;
  }
}

ex:ListOfExpressions CLOSED {
  rdf:type rdf:List?;
  rdf:first @shex:TripleExpression;
  rdf:rest [rdf:nil] OR @ex:ListOfExpressions;
}

shex:TripleExpression {
  rdf:type shex:OneOf OR shex:EachOf OR shex:Inclusion OR shex:TripleConstraint;
}

But, for what seems like such a common pattern, this is pretty heavy weight. One thought I had was to add a new node kind LIST, which would serve dual purposes of verifying that the property value was a valid RDF Collection, and would also alter value and cardinality checking. The shape might look something like the following:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX shex: <https://shexspec.github.io/ns/>
PREFIX ex: <http://schema.example/>

shex:EachOf CLOSED {
  rdf:type shex:EachOf;
  shex:expressions LIST @shex:TripleExpression{2,}
}

The downside of this is that we are overloading nodeKind to allow for modifying other behavior in the TripleConstraint. Also, we can't talk about both the cardinality of a list, and the cardinality of the number of lists that should be property values in the TripleConstraint (which I think is really of only theoretical utility, but ...).

Eric came up with something that extends the grammar to look something like the following:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX shex: <https://shexspec.github.io/ns/>
PREFIX ex: <http://schema.example/>

shex:EachOf CLOSED {
  rdf:type shex:EachOf;
  shex:expressions LIST(@shex:TripleExpression{2,}){1,3}
}

which would not use a node kind, but a new LIST functional syntax, so you could say the shex:expression must have at least one and no more than three distinct lists, each one of which must have at least two entries with the shape shex:TripleExpression.

Personally, I think overloading nodeKind works well, and if someone really has a need to talk both about cardinality of lists as well as list elements, they can fall back to rdf:first/rest primitives.

labra commented 7 years ago

This feature is very interesting. About the grammar, I understand that

shex:expressions LIST @shex:TripleExpression{2,}

may be ambiguous to parse/understand and a parser would not know if the {2,} refers to the shex:expressions arc or to the size of the list, so probably some parenthesis are needed:

shex:expressions LIST(@shex:TripleExpression{2,}){1,3}

As I understand it, the {2,} declares the expected size of the list (in this case 2 or more elements).

I we allow any cardinality expression, should we also allow +, * and ?, so, for example, ? would mean a list of one element or rdf:nil ?

And in case of {0,0}, does it mean that the list is the value rdf:nil ?

This captures a pattern to avoid DRY, so I 👍 to add this feature to the next release of ShEx.

jessevdam commented 7 years ago

This feature is very useful, but I would like suggest also a shorthand, which would be like shex:expression @shex:TripleExpression{1,3}~ that is equal to shex:expressions LIST(@shex:TripleExpression){1,3}

And for sequences I would like to use shex:expression @shex:TripleExpression{1,3}= that is equal to shex:expressions SEQ(@shex:TripleExpression){1,3}

kcoyle commented 6 years ago

Will the list members be ordered?

ericprud commented 6 years ago

Yes, as dictated by the semantics of RDF Collections.