w3c / data-shapes

RDF Data Shapes WG repo
90 stars 33 forks source link

Expressing statistics using VOID based on SHACL shapes #154

Open tfrancart opened 10 months ago

tfrancart commented 10 months ago

(original thread at https://lists.w3.org/Archives/Public/public-shacl/2024Jan/0004.html)

I have a use-case that involves documenting a dataset (not necessarily validating it) against a SHACL specification. By documenting I mean provide a "summary" of the graph. To do this I would like to count the number of targets of each shape in the dataset; for this I would like to express the statistics of the dataset against the SHACL specification, that is the number of targets of each node shape, the number of occurrences of each property shape, and the number of distinct values of each property shape. For this I can rely on VOID vocabulary statistics, but only if the node shapes corresponds to classes (i.e. use sh:targetClass) to use void:classPartition. To generalize the partitioning approach of VOID I could also use a more SHACL-related partitioning of a dataset, such as xx:shapePartition, with a "xx:shape" property pointing to a shape. This partition would "contain all triples that describe entities that are targets of the shape indicated with xx:shape." The statistics are then expressed on the partition entity, using void:entities, void:triples and void:distinctObjects.

to summarize, the idea is to extend the VOID partitioning vocabulary to work with shapes instead of of classes/properties.

Right now I use a dcterms:conformsTo predicate to link a partition to its corresponding shape.

This is implemented in the SHACL generation algorithm of SHACL Play

See also #153

tfrancart commented 9 months ago

Very close to https://github.com/cygri/void/issues/114