regen-network / regen-registry-standards

:seedling: RDF and SHACL schemas for Regen Registry
4 stars 1 forks source link

feat: add script to automate shacl validations #23

Closed wgwz closed 1 year ago

wgwz commented 1 year ago

Description

This PR adds a python script that can be used to run SHACL validations against the live metadata within the ops folder. This can eventually be used to perform CI checks before we merge code into master.

This is a zero-dependency python3 script. The python script orchestrates which SHACL files need to be run against which data files. The python script calls the Apache Jena CLI tool to actually validate the SHACL files against data inputs.

Once Apache Jena and python3 are installed, you just run this:

❯ ./shacl_validate.py         
shapes=shacl/credit-classes/C01-verified-carbon-standard-class.ttl on data=ops/C01/credit-class-metadata/C01-verified-carbon-standard-credit-class.jsonld.. CONFORMS
shapes=shacl/projects/C01-verified-carbon-standard-project.ttl on data=ops/C01/project-metadata/C01-ProjectId934-metadata.jsonld.. CONFORMS
shapes=shacl/projects/C01-verified-carbon-standard-project.ttl on data=ops/C01/project-metadata/C01-ProjectId612-metadata.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-005.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-007.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20190101-20191231-001.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-003.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20190101-20191231-006.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-004.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-008.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20190101-20191231-002.jsonld.. CONFORMS

If a SHACL file does not validate against a given data file for some reason, an error message will be reported:

❯ ./shacl_validate.py                                          
shapes=shacl/credit-classes/C01-verified-carbon-standard-class.ttl on data=ops/C01/credit-class-metadata/C01-verified-carbon-standard-credit-class.jsonld.. CONFORMS
shapes=shacl/projects/C01-verified-carbon-standard-project.ttl on data=ops/C01/project-metadata/C01-ProjectId934-metadata.jsonld.. CONFORMS
shapes=shacl/projects/C01-verified-carbon-standard-project.ttl on data=ops/C01/project-metadata/C01-ProjectId612-metadata.jsonld.. INVALID
@prefix dash:    <http://datashapes.org/dash#> .
@prefix geojson: <https://purl.org/geojson/vocab#> .
@prefix qudt:    <http://qudt.org/schema/qudt/> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix regen:   <http://regen.network/> .
@prefix schema:  <http://schema.org/> .
@prefix sh:      <http://www.w3.org/ns/shacl#> .
@prefix unit:    <http://qudt.org/vocab/unit/> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .

[ rdf:type     sh:ValidationReport ;
  sh:conforms  false ;
  sh:result    [ rdf:type                      sh:ValidationResult ;
                 sh:focusNode                  []  ;
                 sh:resultMessage              "DatatypeConstraint[xsd:integer] : Got datatype xsd:integer : Node \"foo\"" ;
                 sh:resultPath                 regen:vcsProjectId ;
                 sh:resultSeverity             sh:Violation ;
                 sh:sourceConstraintComponent  sh:DatatypeConstraintComponent ;
                 sh:sourceShape                []  ;
                 sh:value                      "foo"
               ]
] .
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-005.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-007.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20190101-20191231-001.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-003.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20190101-20191231-006.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-004.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20150101-20151231-008.jsonld.. CONFORMS
shapes=shacl/credit-batches/C01-verified-carbon-standard-batch.ttl on data=ops/C01/credit-batch-metadata/C01-20190101-20191231-002.jsonld.. CONFORMS

If the validation script succeeds it exits with code 0, if it fails it exits with code 1. So this can nicely be utilized as a CI check (https://docs.github.com/en/actions/creating-actions/setting-exit-codes-for-actions).

This is a definite improve to our dev experience here! 🎉


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and please add links to any relevant follow up issues.

I have...

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add your handle next to the items reviewed if you only reviewed selected items.

I have...