Open labra opened 7 years ago
I'd be happy to work on this. I'm a Jena committer/PMC member, so I feel like I'm fairly well-equipped to do it (although my Scala skills are very much a beginner's). But having glanced at the code, I'm finding it rather difficult to dig in.
I can see that es.weso.schema.Schema
is key and a Schema
has to be brought together with some RDF and a ValidationTrigger
. But I can't seem to find any explanation of what ValidationTrigger
s do and how they work. Also, I can see that an RDFReader
is at least one way to bring RDF to the schema, but is it the right one? It seems that there might be better options?
It would be great if you can contribute.
schema
module is a common module for both ShEx
and SHACL
which captures the notion of validating RDF data with some schema (which can be ShEx or SHACL).ValidationTrigger
is used to capture the different possibilities to trigger validation in Shaclex. One problem is that the reports generated by ShEx and SHACL are different. ShEx generates a result shape map while the SHACL specification declares that it must generate a ValidationReport. So I think it would be good to declare a common structure that can handle both.
In my opinion, this structure could be a result shape map as in ShEx but with extra information on the errors following the ValidationReport properties defined in SHACL. The reason why I like having a result shape map is because it has information on both the positive and negative shapes associated to a node, while the SHACL validation report defined in the spec only has definitions about the errors.
At this moment, I am working to improve the SHACL implementation so it can generate a result shape map that could later be converted to a SHACL validation report. This work is in branch shacl-allResults and more information is in issue #38.
I agree entirely that a common report structure is necessary to do this ticket properly. Do you think it is worth me trying to work part of this ticket with attention paid mostly to ShEx, on the assumption that your work in #38 will let SHACL "catch up"? Or should I wait until we understand what the common report format will look like?
Because Jena has traditionally offered inference via subtypes of Model
, I wonder if perhaps the idiomatic way to offer validation would be via a ValidatingModel
with members for schema and trigger and a validate
method and a SchemaModel
(and subtypes ShExSchemaModel
and ShaclSchemaModel
) which can hold the schema and would be equipped with special serializations. Then the result structure could be exposed as a Model
too, potentially with special serializations. That would let me avoid exposing any part of the Shaclex RDF impl "backwards" into Jena. A user would see only Jena Model
s.
Yes, that makes sense.
I agree that an idiomatic way in Jena would be to define a ValidationModel
with those subtypes, so a user would not need to be exposed to the internals of Shaclex and having those subtypes it is easier to work on them independently.
I wonder what would be the result of method validate
. It could be a ´ResultShapeMap` or maybe, it could be another model representing the result shape map (which could contain the SHACL validation report also).
More info about the current design:
One requirement for Shaclex is to be independent from Jena so it can be adapted to other libraries like rdf4j or banana-rdf in the future.
To that end, the srdf
module represents a simple rdf interface with the methods that are needed for validation. At this moment, both ShEx and SHACL validators only calls methods from srdf
.
The srdfJena
module is the module that implements srdf
(at this moment it is the only one but I am planning to add more). It contains RDFAsJenaModel
that implements RDFReader
and RDFBuilder
.
I wonder what would be the best namespace to define ValidateModel
...it should probably be added in srdfJena
because it depends on Jena...or maybe another module called jenaValidator
or something like that.
About your previous question, if you want to work on this ticket, great. I think it can be done in parallel.
I appreciate the need for Shaclex to remain independent from any particular RDF framework. That's non-negotiatible. Seems like banana-rdf
is a particularly great choice, because it gives you all the usual frameworks "for free".
Here is a potential tactic: what if I pursued this work from a separate codebase entirely (i.e. separate from Jena and from Shaclex), but with dependencies to both?
My thinking here is that most people who will want to use this kind of functionality (that is, to use validation with Jena) will be starting from Jena, not from Shaclex. Perhaps, as the projects proceed, we might be able to bring a potential jena-validation
module over to Jena (of course, that would require a full discussion within and commitment from Jena), and in any event with this approach we do not need to find a place to add this stuff in Shaclex itself, which remains independent and concentrated on the real meat of the matter-- validation itself (not glue code! :grin:). Does that sounds reasonable? If so, I will set up a jena-validation
project in a public Github repo and I can begin work.
Yes, I was also thinking about that alternative as well. It also makes sense and as you say, it could provide cleaner code and less dependencies for Shaclex.
Another possibility is that you could even develop that library entirely in Java. It would probably make more sense and most people would be more familiar with Java than Scala.
If you go ahead, let me know and I can help bridging the gap between both.
I had created some time ago a simple Java based project to show how Shaclex could be called from Java here but I don't actively maintain it.
Cool, I've thrown something up here. That example looks really helpful, thanks! I need to do some Jena work, but I should be able to get back to this (after studying your example) sometime this coming week.
A separate java library would be nice indeed. FYI, we have developed a Java Validation library based on Shaclex, as part of the Eclipse Lyo project. (in the process of releasing it once we have the IP process done). See https://git.eclipse.org/r/#/c/101273/6 But returning validation results as Jena models is still relevant, and would improve our library, so I hope we can help contribute here as well.
@berezovskyi @yashskhatri! Any tips on how we call Shaclex from java? I don't believe we used https://github.com/labra/shexjava in the end.
Note on the Validation library I just mentioned ... A big chunk of the contribution in the library is to marshal/unmarchal Java POJOs to Jena models, based on SHACL annotations. But nevertheless, maybe there is something in that library that can be of help?
@yashkhatri!
@ajs6f @jadelkhoury I didn't know that having a ValidationModel
implement a Jena Model was the idiomatic way. I think we will just follow @ajs6f's lead here and lyo-validation
should in the end rely on jena-validation
. The way we do validation right now is quite direct: https://git.eclipse.org/r/#/c/101273/6/org.eclipse.lyo.validate/src/main/java/org/eclipse/lyo/validate/impl/ValidatorImpl.java@188
Okay, I should be able to find some time to tinker with this tomorrow. @berezovskyi; as far as "idiomaticity", I'm taking my cue (as mentioned above) from how Jena integrated inference functions. But I and the other committers are always open to new ideas!
Taking a look at this, I am trying to figure out what is and is not necessary to expose on the Java side. There are some types and methods in the Shaclex API that are undocumented and that I do not understand:
SchemaLabel
Solution
Schema.validate(rdf: RDFReader, triggerMode: String, shapeMap: Map[String, List[String]], optNode: Option[String], optShape: Option[String], nodePrefixMap: PrefixMap = PrefixMap.empty, shapesPrefixMap: PrefixMap = pm )
Do I need to just go and read the docs for ShEx and SHACL very carefully, or is it possible to give some hints about these guys? Obviously, I can just read the codebase very very carefully, but with my newbie Scala skills, that's going to take a long time... :)
As said above, I'm having a really hard time getting any grip on this without understanding Schema.validate
. Is it possible to get some explanation of the parameters for that method? It is to be found in the SHACL recommendation?
@labra et al., just bumping this because I have some time in the next few weeks to make progress here, but I cannot so do without some insight into what the various types in shaclex do. I don't know which of the parameters of Schema.validate
I should or could reexpose in Java or what some of them even mean... as I said, my Scala skills are minimal, so if I have to reverse-engineer the semantics, I'm not going to get very far very fast with it. Is there any documentation I can read?
Thanks, you are right that we need to improve the documentation. I have started creating some documentation about the architecture and the modules of the library, but it is incomplete.
These days, I was quite busy preparing the online version of the Validating RDF data book which can also be useful to have some documentation about the languages.
For example, this section describes the SHACL validation report and this chapter compares both languages.
In order to interact with the library using Jena Models, one issue is to represent as a Jena Model the validation results.
My preference is to be able to have RDF representations of Shape Maps (used by ShEx) which can also be compatible with the RDF representations of Validation Reports (used by SHACL).
@labra Thanks so much for responding! I have been recommending Validating RDF, but I hadn't noticed that you were an author. Congrats!
In fact, I don't think we need to have results as a Model
(for example, Jena doesn't offer SPARQL results as RDF, because that wouldn't make sense) , but it would certainly be nice. I will start as you advise, by working on RDF representations of the two kinds of results. I hope to have something to offer within a few weeks, depending on many other things. :grin: Happy New Year!
I hadn't looked at SHACL very much before, but it seems to me that we could mint a new predicate someNS:conforms
< sh:conforms
and use it to represent shape maps e.g. (using the example from your book):
:alice@:User,
:alice@:Employee,
:bob@:User
=>
:alice someNS:conforms :User .
:alice someNS:conforms:Employee .
:bob someNS:conforms :User .
and also use it instead of using sh:conforms
in SHACL validation reports. Then to a first order (determining compliance or lack thereof) the same RDF is in use. It's not as clear to me how to represent the richer information in a common way, but perhaps we can find another commonality between ShEx's reason
and SHACL's sh:resultMessage
?
As suggested by @jadelkhoury one possibility to improve Java interoperability (see issue #17) is to limit the interaction to the usage of Jena RDF Models.
Java programmers would provide Jena Models as input and the validation process would return the result encoded in Jena models.
Providing Jena RDF models as input is already supported for SHACL and even for ShEx using the recent ShEx RDF encoded based on Json-LD.
The only missing piece is to convert the validation result to RDF and then return the corresponding model, which seems an easy task.