iptc / sport-schema

The next generation of sports data, based on IPTC’s SportsML and semantic web principles
17 stars 1 forks source link

SHACL SKOS validation #197

Open riannella opened 3 months ago

riannella commented 3 months ago

In the SHACL file: https://github.com/iptc/sport-schema/blob/main/ontologies/iptc-sport-shacl.ttl

It includes:

sport:ParticipatableThingShape
  sh:targetClass sport:ParticipatableThing ;
  # sh:closed true ;
  sh:ignoredProperties ( rdf:type rdfs:label ) ;
  sh:property [
    sh:path sport:sport ;
    sh:class skos:Concept ;
    sh:pattern "^http://cv.iptc.org/newscodes/mediatopic/" ;
    sh:flags   "i"  # Ignore case
  ] ;

Does this really enforce that the value of sport:sport must be a SKOS concept from the mediatopic vocab?

bquinn commented 3 months ago

Yes that's the idea. For our main examples all sports were listed as MediaTopics.

And before you ask, yes Aussie rules football, rugby league and rugby union are all listed 🙂

If there are any spots not listed that you would like to see, please let us know.

riannella commented 3 months ago

My question is more technical...

Does sh:pattern "^http://cv.iptc.org/newscodes/mediatopic/" ;

meet the conformance (SHACL) requirement in that the "values must come from the IPTC skos vocab"

when the following string matches the pattern: http://cv.iptc.org/newscodes/mediatopic/foofoofoo

bquinn commented 3 months ago

I see what you mean. If we include the mediatopics in the ontology files, do you know of a simple way to validate it on a value level using SHACL?

We follow a similar regex-based match for all of our CV terms, but it would be great if there was a smarter / less error-prone way to do it.

riannella commented 3 months ago

I have created and tested the below SHACL rule.

We first define a NodeShape for valid MediaTopic concepts (in this case, they all have the skos:inScheme property set to <http://cv.iptc.org/newscodes/mediatopic/>.

Then the CompetitionShape must have a sport:sport property, that is an instance of skos:Concept, and whose "value shape" fits to the MediaTopicShape

ex:MediaTopicShape a sh:NodeShape;
    sh:property [
        sh:path skos:inScheme ;
        sh:hasValue <http://cv.iptc.org/newscodes/mediatopic/> ;
        sh:maxCount 1 ;
    ] .

ex:CompetitionShape a sh:NodeShape ;
    sh:targetClass sport:Competition ;
    sh:property [            
        sh:path sport:sport ;     
        sh:class skos:Concept ; 
        sh:minCount 1 ;
    sh:node ex:MediaTopicShape ;
    ] .
bquinn commented 3 months ago

Great! We'll use that. Thanks very much for your help!

bquinn commented 3 months ago

@pauljkelly note that @riannella helped us out here and the fix works very well.

Questions based on this:

  1. Do we really want to require that every Competition has a sport:sport defined? It probably makes sense but may be too much for some requirements...? (Many of our sample data sets fail this test right now)
  2. Should we use this pattern for other IPTC NewsCodes CVs? Probably yes...?