simphony / simphony-osp

A framework that aims to achieve interoperability between software such as simulation engines, databases and data repositories using a knowledge graph as the common language.
https://simphony.readthedocs.io
Other
16 stars 12 forks source link

Enh/attributes in schema validation #822

Closed MBueschelberger closed 2 years ago

MBueschelberger commented 2 years ago

Simple enhancement for checking of data properties (attributes) during schema validation in SimPhoNy < v4.

Closes #821

kysrpex commented 2 years ago

Thanks for providing your own solution, I will review it within the next few days.

MBueschelberger commented 2 years ago

@kysrpex: the CI/testing currently fails because https://xmlns.com/foaf/spec/index.rdf for pulling the foaf-ontology is down. We might rerun it later again.

kysrpex commented 2 years ago

Thank you for submitting the PR. It is good.

I requested changes only due to two minor details: a typo and deleting a TODO (see reasons on review comments). I provided the changes as "code review suggestions" so I think you can just click a button below the suggestion to apply the changes directly on GitHub.

@MBueschelberger I will take care of the failing CI once you commit the requested changes.

MBueschelberger commented 2 years ago

@kysrpex:

I was also asked for the functionality to check the actual values of the Cuds.

For my opion this makes absolutely sense if you want to write a unittest for your function which is generating Cuds with actual data.

However, my suggestion would be that this might also overwrite the cardinality with a value-keyword in the case of data properties:

Restriction to length

    city.Neighborhood:
        city.name:
            STRING:
                value: 1+

Following a similar pattern as the cardinality, this value might e.g. in case of a string restrict the length (see example above) or the actual value:

Restriction of value

    city.Neighborhood:
        city.name:
            STRING:
                value: Stühlinger

In the case of floats, bools and ints, there is no restriction of the length needed, but rather a statement towards the data range:

Restriction of range

    city.Citizen:
        city.age:
            INTEGER:
                value: 0+

... or the actual value:

Restriction of value

    city.Citizen:
        city.age:
            INTEGER:
                value: 29
        city.name:
            STRING:
                value: Matthias

In case of vectors, length, data range and value restriction might be a bit more complicated. But I think this is not a huge priority to be implemented since we are ontologizing in the EMMO-way, without any vectors as data properties.

Since you already closed this PR and deleted the branch, I can come up with a new branch with this idea.

What do you think?

kysrpex commented 2 years ago

@kysrpex:

I was also asked for the functionality to check the actual values of the Cuds.

For my opion this makes absolutely sense if you want to write a unittest for your function which is generating Cuds with actual data.

However, my suggestion would be that this might also overwrite the cardinality with a value-keyword in the case of data properties:

Restriction to length

    city.Neighborhood:
        city.name:
            STRING:
                value: 1+

Following a similar pattern as the cardinality, this value might e.g. in case of a string restrict the length (see example above) or the actual value:

Restriction of value

    city.Neighborhood:
        city.name:
            STRING:
                value: Stühlinger

In the case of floats, bools and ints, there is no restriction of the length needed, but rather a statement towards the data range:

Restriction of range

    city.Citizen:
        city.age:
            INTEGER:
                value: 0+

... or the actual value:

Restriction of value

    city.Citizen:
        city.age:
            INTEGER:
                value: 29
        city.name:
            STRING:
                value: Matthias

In case of vectors, length, data range and value restriction might be a bit more complicated. But I think this is not a huge priority to be implemented since we are ontologizing in the EMMO-way, without any vectors as data properties.

Since you already closed this PR and deleted the branch, I can come up with a new branch with this idea.

What do you think?

Sorry I was not expecting a new message :sweat_smile:

If you open a new PR and provide a working implementation for SimPhoNy v3 for what you are mentioning, I will definitely not decline to merge it. But before you put your hands on the keyboard, I again strongly advise you to carefully check if either SHACL or OTTR solve your problem.

I still did not carefully check, but it is my intention. I prefer to stick to something that exists (as far as I know there are validators for SHACL), especially if it is open-source, rather than to reinvent the wheel. The point I want to make here is that if you implement now a solution for SimPhoNy v3, unless it is better than SHACL or OTTR, I will not keep this solution in v4 or future versions. But unfortunately I do not know if those two can satisfy our needs right now or not.

MBueschelberger commented 2 years ago

Hi @kysrpex, thanks for the hint on SHACL and OTTR.

This functionality for the schema validation was originally requested for a unit test in a small use case, not for a proper pipeline in a larger framework.

On the one hand, I totally agree that SHACL offers good stability+flexiblity against our current validation-algorithm in order to make a proper analysis of our CUDS.

But on the other hand, I think that a proper integration would require a parser which is able to convert the yml-schema into the node-shapes of shacl, which might require some work.

For me, it is currently not clear in how far this might fit into the timeline of the usecase, since there is enough other work to do (as usual). On the other hand, this is of course a major benefit for SimPhoNy v3 and v4, as you already evaluated.

I think I have to discuss this internally before deciding which way to go for the time given.