zazuko / cube-link

Cube Schema
https://zazuko.github.io/cube-link/
Other
12 stars 8 forks source link

set `sh:minCount` 1 for `sh:nodeKind` to prevent mixed nodeKinds #109

Closed Rdataflow closed 5 months ago

Rdataflow commented 12 months ago

see @l00mi remark to prevent mixed nodeKinds https://zulip.zazuko.com/#narrow/stream/40-bafu-ext/topic/foag.3A.20filtering.20dates/near/370780

giacomociti commented 12 months ago

We should consider using shapes to validate shapes mentioned in the SHACL specs, which includes:

sh:property [
  sh:path sh:nodeKind ;
  sh:in ( sh:BlankNode sh:IRI sh:Literal sh:BlankNodeOrIRI sh:BlankNodeOrLiteral sh:IRIOrLiteral ) ;    # nodeKind-in
  sh:maxCount 1 ;                 # nodeKind-maxCount
] ;

Among others, there are shapes to validates lists (and in cube link we have something very similar).

These shacl-shacl shapes are available here

t0b3 commented 12 months ago

@giacomociti yes of course with shacl-shacl (as it's done already in the cube-constraint-constraint.ttl) :+1:

based on your snippet we need to consider

giacomociti commented 12 months ago

so I understand our requirement is a little stronger than the basic consistency constraint provided by shacl-shacl (maybe I commented on the wrong issue because the constraints asked for in https://github.com/zazuko/cube-link/issues/105 instead are directly covered by shacl-shacl).

Currently, we have a constraint on sh:nodeKind within a sh:or condition: we require either a node kind or a data type (or multiple data types within another sh:or).

Maybe we could be even more precise and require either a literal node kind with some data type or an IRI node kind :

    sh:property [
        sh:message "sh:nodeKind needs to be either sh:IRI or sh:Literal with some sh:datatype" ;
        sh:or(
            [
                sh:path sh:nodeKind;
                sh:hasValue sh:IRI;
            ]
            [
                sh:and(
                    [
                        sh:path sh:nodeKind;
                        sh:hasValue sh:Literal ;
                    ]
                    sh:node <datatype> 
                )
            ]
        );
    ] ;

where the <datatype> shape requires some data type (possibly more than one in sh:or)

Rdataflow commented 12 months ago

@giacomociti yes this approach looks promising :+1: not sure on the details: would we need minCount etc.? you have the expertise in this field and you got the main point we need to ensure :smile:

nb: I just looked at shsh. in addition to our specific needs (above) this could serve to ensure the whole set of generic shacl conformity. thus we may benefit by adding this to the list of validations to check our cube:Constraints right in here https://cube.link/#the-integrity-of-the-constraints

tpluscode commented 12 months ago

in addition to our specific needs (above) this could serve to ensure the whole set of generic shacl conformity

We discussed this and I see it two-fold. Cube Creator would not necessarily need to use shacl-shacl because the core of cube shapes are generated by a reusable pipeline step. Thus, the step code should be tested so that we ensure it always produces valid shacl.

On the other hand, data producers who do not use Cube Creator would benefit from a profile which includes shsh, or an explicit validation provided by a CLI (re https://github.com/zazuko/barnard59/issues/187) to check their shapes against shsh in addition to cube.link rules.

Rdataflow commented 12 months ago

@tpluscode sure if you already thoroughly tested cc code to fulfill shsh then there will be no need to test this twice :100:

tpluscode commented 12 months ago

More tests never hurt anyone 😎

Rdataflow commented 5 months ago

Closed by https://github.com/zazuko/cube-link/commit/7e710e3122aa98eb8848bb3a1107e24078470cd9 which requires nodeKind ( sh:IRI sh:Literal )