shexSpec / shex

ShEx language issues, including new features for e.g. ShEx2.1
24 stars 8 forks source link

Validation of RDF* #107

Open herminiogg opened 3 years ago

herminiogg commented 3 years ago

Hello,

I don't know if you have already discussed this (I hope not to be duplicating). Have you planned to include support for validation of RDF? I think that with the traction that RDF is gaining that would be nice to support this. However, I'm not so worried about implementation specifics but about the syntax that would enable that feature in ShEx. As you may know I'm the main developer of ShExML (https://github.com/herminiogg/ShExML) which syntax is very based in ShEx one. For example when dealing with generation of Datasets I followed this proposal (https://github.com/shexSpec/shex/issues/77). But, for the case of RDF* I didn't find anything. So, do you have any plan on how this would be implemented in ShEx or any idea about how the syntax could be? If not, I will be very keen to discuss with you about the possible syntax details.

ericprud commented 3 years ago

Whenever I think about this, my brain turns inside out. Are we validating the stuff inside or outside the <<>>?

Data:

<< <alice> a :User >> dc:creator #me .
<< <alice> foaf:givenName "Alice" >> dc:creator #you .
<< <alice> foaf:familyName "Walker" >> dc:creator #you .

Would the schema be

Schema 1

<#UserShape> {
  << a [foaf:User] >> dc:creator . ;
  << foaf:givenName xsd:string >> dc:creator . ;
  << foaf:familyName xsd:string >> dc:creator . ;
}

or

Schema 2

<#UserAssertionShape> {
  dc:creator >> a [foaf:User] << ;
  dc:creator >> foaf:givenName xsd:string << ;
  dc:creator >> foaf:familyName xsd:string <<  ;
}

As far as I can tell, we'd want the former because in the latter, we'd have to invent something like variables ot make sure the subjects of the triples were the same. That said, the latter is closer to how ShEx operates today. Thoughts?

herminiogg commented 3 years ago

Hi Eric,

I would say that we should validate in and out the <<>>, as all is data that is susceptible to be validated or to be part of a schema. That said, in both examples I think we are missing a variable: o in schema 1 and s' in schema 2.

I would suggest a syntax like (following schema 1):

<#UserShape> {
  << a [foaf:User] >> dc:creator IRI . ;
  << foaf:givenName xsd:string >> dc:creator IRI . ;
  << foaf:familyName xsd:string >> dc:creator IRI . ;
}

And we may also validate the s p <<s' p' o'>> productions: E.g.:

:employee1 :createdEntry << :alice foaf:givenName "Alice" >> .
<#UserDataEntryVerificationShape> {
  :createdEntry << @<#User> >>  . ;
}

<#User> {
  foaf:givenName xsd:string .
}
pchampin commented 3 years ago

@herminiogg and @ericprud you both seem to be assuming that a shape applying to a node should constrain the asserted triples and the embedded triples involving that node. I am not sure I agree with that. Consider the following graph:

:alice a :Person; :name "Alice" ; :birthDate '1987-06-05'^^xsd:date.
:bob :says << :alice a :Genius >>.

I can imagine that a shape Person would require Alice to have a name, or her birth date to be an xsd:date, but I don't see the point in that shape constraining who says what about her...

On the other hand, I see value in constraining the annotated triples about a given node, i.e. triples that are both asserted and embedded as subject. This pattern emulates Property Graphs' edge properties, and Turtle-star provides a shorthand for it (see example 2 in the RDF-star CG report).

This could go like this (reusing the "annotation brackets" of Turtle-star):

:User {
  schema:name          xsd:string  ;
  schema:birthDate     xsd:date?  ;
  schema:spouse        @:User * {| 
    :from    xsd:date ;
    :until     xsd:date? ;
  |}?;
}

or even maybe

:User {
  schema:name          xsd:string  ;
  schema:birthDate     xsd:date?  ;
  schema:spouse        @:User * {| @:TemporalAnnotation |}?;
}
:TemporalAnnotation {
  :from    xsd:date ;
  :until     xsd:date? ;
}

For triples that are embedded only, we would need a dedicated node kind Triple, and probably also a dedicated node constraint, for which we could reuse the double-brackets

:JudgeOfCharacter {
    :says      << @:User [ rdf:type ] [ :Genius :Moron ] >>;
}
ericprud commented 3 years ago

@pchampin , I took me ages to get around to reading this, but now that I have (sort of), I like it.

For posterity's sake, this hackmd has some early thoughts on ShEx* https://hackmd.io/J_XhobnoT6a_tq6Cr0GK6w . Feel free to scribble in there wherever you'd like.