zazuko / SPEX

SPEX is designed to introspect data within SPARQL endpoints, leveraging the self-describing nature of RDF-based data to enhance your comprehension of the underlying schema.
https://spex.zazuko.com
MIT License
31 stars 3 forks source link

Define proper discovery mechanism for predefined SHACL shapes and viewport definition #25

Closed ktk closed 3 years ago

ktk commented 4 years ago

IANA defines .well-known/void. That should be the entry point and in there we should follow the basic VoID and/or schema.org Dataset syntax to point to data.

l00mi commented 3 years ago

Also compare https://github.com/zazuko/SPEX/discussions/32#discussioncomment-334764

martinmaillard commented 3 years ago

Some things that are not clear to me:

l00mi commented 3 years ago

The classPartition and propertyPartition look quite promising? @bergos what do you think?

martinmaillard commented 3 years ago

The issue with them is that they only allow creating a subset that contains one class or property. Not really what we want.

l00mi commented 3 years ago

I read it as you can add multiple "partitions" to a dataset which allows you define one "view".

bergos commented 3 years ago

I guess we have to use http://rdfs.org/ns/void#DatasetDescription or/and http://schema.org/DataCatalog as an entry point. There can be multiple Datasets, but there should be only one DatasetDescription. But how to attach it? I also don't see any property in void that looks suitable. We may also need some properties and classes for the viewports. I would propose to define a new namespace.

void:*Partition could be an alternative to my schema.org based set proposal. But if we use it, we will have additional Datasets for include/exclude sets. Is that ok? People/machines could get lost what is an actual dataset and what is just a set for a spex viewport.

l00mi commented 3 years ago

I rather would add mulitple Datasets and then use a new namespace and class to addinally make it clear that its used as a viewport. E.g. spex:ViewPort?

l00mi commented 3 years ago

@martinmaillard

  1. Can you please try to define the structure (with an example), without any namespaces for the ViewPort use case. Then we will give input on the final solution.

  2. @bergos will make a proposition how the SHACL shapes are attached to the .well-known/void

bergos commented 3 years ago

Maybe it's easier to make the entry point not based on a class instead we just define that it must be attached to /.well-know/void.

The property used to point to the shapes would be defined in the spex: namespace. spex:shape should match with the shacl shape use case, but would also work for any other data that describes the shape of the data (not planned, just in case).

That property could be used to point to the shape for each class. But it would be also possible to use one shape as a collection for all class-related shapes. With an additional type, this collection shape could be flagged as the default shape. That would allow having alternative shapes for any future use cases, like actual data vs. model.

</.well-know/void>
  spex:shape [ a sh:NodeShape, spex:DefaultShapes;
    sh:node [ a sh:NodeShape;
      sh:targetClass <https://permits.ld.admin.ch/schema/Profession>;
      sh:property [ a sh:PropertyShape;
      ], [ a sh:PropertyShape;
      ]
    ], [ a sh:NodeShape;
      sh:targetClass <https://permits.ld.admin.ch/schema/...>;
      sh:property [ a sh:PropertyShape;
      ], [ a sh:PropertyShape;
      ]
    ]
  ].
martinmaillard commented 3 years ago

I'll use the spex namespace for the prototype, but I was wondering: doesn't that introduce a really bad pattern where every tool requires specific metadata to describe a dataset?

Other thing: I'm a bit surprised that you used a surrounding sh:NodeShape and that classes are linked with sh:node. Can you explain the reasoning behind this?

Now to extend your example with viewpoints, I would see something like:

</.well-know/void>
  spex:shape [ 
    a sh:NodeShape, spex:DefaultShapes;
    sh:node [ a sh:NodeShape;
      sh:targetClass <https://permits.ld.admin.ch/schema/Profession>;
      sh:property [ a sh:PropertyShape;
      ], [ a sh:PropertyShape;
      ]
    ], [ a sh:NodeShape;
      sh:targetClass <https://permits.ld.admin.ch/schema/...>;
      sh:property [ a sh:PropertyShape;
      ], [ a sh:PropertyShape;
      ]
    ] ;
  ] ;
  spex:viewport [
    a spex:ViewPort ;
    schema:name "Only the Profession class" ;
    spex:includes <https://permits.ld.admin.ch/schema/Profession> ;
  ] ;
  spex:viewport [
    a spex:ViewPort ;
    schema:name "Everything but the Profession class" ;
    spex:excludes <https://permits.ld.admin.ch/schema/Profession> ;
  ] ;
.

@bergos in the discussion, you mention using one more level and hasPart for the items in the include and exclude sets. I'm not sure I understand the advantage.

bergos commented 3 years ago

I'll use the spex namespace for the prototype, but I was wondering: doesn't that introduce a really bad pattern where every tool requires specific metadata to describe a dataset?

I agree on that, but to avoid getting lost now, let's define our structure and later we check how well it matches with existing stuff.

Other thing: I'm a bit surprised that you used a surrounding sh:NodeShape and that classes are linked with sh:node. Can you explain the reasoning behind this?

The additional level would allow to create bundles of shapes. The use case would be similar to branches in git. There could be one bundle of shapes that covers the current data and others to model possible extensions to the data model.

using one more level and hasPart for the items in the include and exclude sets

My idea was to hand over include and exclude arguments in the URL and by having a set which will be fetched and expanded by the code, the URLs should not explode. That's the URL arguments perspective.

From data structure perspective, I'm not sure if it makes sense. We use it for variables in pipelines. That allows to combine different sets of variables like <defaults> + <intEndpoint> and <defaults> + <prodEndpoint>, but as this will be done mainly by an UI, this is not required.

l00mi commented 3 years ago

@martinmaillard did you implement something in this regard now? If yes can you point os to it that we can potentially officiate it.

martinmaillard commented 3 years ago

Nothing done yet

l00mi commented 3 years ago

So do you need anything more from here to advance on SPEX?

martinmaillard commented 3 years ago

Time 😄

And if you have some inputs about the RDF structure I shared above, that's welcome too.

martinmaillard commented 3 years ago

Let's say we keep the additional nesting level for the shapes. As @bergos mentioned, it could make it easier to allow multiple versions of the schema later (or something like that).

I don't really like the use of sh:NodeShape for the top-most level and sh:node to point to shapes. I feel like it's trying really hard to use SHACL, but the result doesn't mean anything. If I try to interpret it, I think it would mean "to satisfy this shape, the target must conform to all these sub-shapes".

Here's what I propose:

</.well-know/void> spex:shape [ 
-    a sh:NodeShape, spex:DefaultShapes;
+   a spex:DefaultShapes ;
-    sh:node [ 
+   schema:hasPart [
      a sh:NodeShape;
      sh:name "Profession" ;
      sh:targetClass <https://permits.ld.admin.ch/schema/Profession>;
    ], [ 
      a sh:NodeShape;
      sh:name "Something" ;
      sh:targetClass <https://permits.ld.admin.ch/schema/Something>;
    ] ;
  ] ;
martinmaillard commented 3 years ago

One more question: which URI should I use for the spex prefix?

l00mi commented 3 years ago

I would go for https://described.at/spex/ @sandhose @ludovicm67 where is the repo for described.at ? then we can add some triples there also.

ludovicm67 commented 3 years ago

The repo for described.at is here: https://gitlab.zazuko.com/docker/described.at

martinmaillard commented 3 years ago

I implemented basic support for viewports. Only supporting spex:includes currently because I'm not sure what the semantics should be for includes + excludes.