kg-construct / mapping-challenges

Issues for discussion about limitations of current mapping languages
Apache License 2.0
4 stars 9 forks source link

Enforce data shape #16

Closed VladimirAlexiev closed 2 years ago

VladimirAlexiev commented 3 years ago

How to ensure that a transformation produces data that conforms to some shape?

The best is to generate both from a model that describes both the shape of output data, the input fields and where to use them, and shape specifics (eg cardinality).

herminiogg commented 3 years ago

I will give a partial answer to this:

Due to different questions in TPAC sessions about ShExML and the possibility to use information to also validate I developed a new functionality on ShExML (https://github.com/herminiogg/ShExML/releases/tag/v0.2.3) to generate Shape Expressions from input ShExML script and generated data. You can see it working here: http://shexml.herminiogarcia.com/validation/

What it does is to take shape information from ShExML syntax and infer some other data (e.g., cardinality). As I said in TPAC session the process is a bit naive because it will always validate right against generated data. However, it could be used as a starting point and then tuned for more specific features.

I know this is not exactly what you described in the issue as this data shape is produced after generation and not before. But I think it is interesting to discuss about it in order to see further possibilities.

VladimirAlexiev commented 3 years ago

@herminiogg thanks! This is exactly what I had in mind.

But also annotate shexml with constraints to be carried over to shex, eg

Cardinality may seem trivial because you can examine the source to see what the cardinality is.

But things get more complicated when you consider real (dirty) data, eg

Which may lead to this question: is it better to embed shex (shex annotations) into shexml? Or vice versa: express shexml as annotations in shex. I think this is the better approach since shex has a large community and tooling, and extensible annotations. This way your shexml approach may find wider use.

dachafra commented 2 years ago

Similar proposal from @DylanVanAssche and colleagues but using RML and SHACL: https://dylanvanassche.be/assets/pdf/kcap2021-rml2shacl.pdf

IMHO opinion this is still in a research mode and not ready to be part of the specs/engines. Additionally, as there is no discussion for more than 1y I'm going to close the issue.

DylanVanAssche commented 2 years ago

Similar proposal from @DylanVanAssche and colleagues but using RML and SHACL: https://dylanvanassche.be/assets/pdf/kcap2021-rml2shacl.pdf

FYI: @thomas-delva wrote the paper, I contributed to it and hosted it :)