weso / shaclex

SHACL/ShEx implementation
http://weso.github.io/shaclex
MIT License
78 stars 17 forks source link

Add support to interact with the library using Jena Models #28

Open labra opened 7 years ago

labra commented 7 years ago

As suggested by @jadelkhoury one possibility to improve Java interoperability (see issue #17) is to limit the interaction to the usage of Jena RDF Models.

Java programmers would provide Jena Models as input and the validation process would return the result encoded in Jena models.

Providing Jena RDF models as input is already supported for SHACL and even for ShEx using the recent ShEx RDF encoded based on Json-LD.

The only missing piece is to convert the validation result to RDF and then return the corresponding model, which seems an easy task.

ajs6f commented 7 years ago

I'd be happy to work on this. I'm a Jena committer/PMC member, so I feel like I'm fairly well-equipped to do it (although my Scala skills are very much a beginner's). But having glanced at the code, I'm finding it rather difficult to dig in.

I can see that es.weso.schema.Schema is key and a Schema has to be brought together with some RDF and a ValidationTrigger. But I can't seem to find any explanation of what ValidationTriggers do and how they work. Also, I can see that an RDFReader is at least one way to bring RDF to the schema, but is it the right one? It seems that there might be better options?

labra commented 7 years ago

It would be great if you can contribute.

One problem is that the reports generated by ShEx and SHACL are different. ShEx generates a result shape map while the SHACL specification declares that it must generate a ValidationReport. So I think it would be good to declare a common structure that can handle both.

In my opinion, this structure could be a result shape map as in ShEx but with extra information on the errors following the ValidationReport properties defined in SHACL. The reason why I like having a result shape map is because it has information on both the positive and negative shapes associated to a node, while the SHACL validation report defined in the spec only has definitions about the errors.

At this moment, I am working to improve the SHACL implementation so it can generate a result shape map that could later be converted to a SHACL validation report. This work is in branch shacl-allResults and more information is in issue #38.

ajs6f commented 7 years ago

I agree entirely that a common report structure is necessary to do this ticket properly. Do you think it is worth me trying to work part of this ticket with attention paid mostly to ShEx, on the assumption that your work in #38 will let SHACL "catch up"? Or should I wait until we understand what the common report format will look like?

ajs6f commented 7 years ago

Because Jena has traditionally offered inference via subtypes of Model, I wonder if perhaps the idiomatic way to offer validation would be via a ValidatingModel with members for schema and trigger and a validate method and a SchemaModel (and subtypes ShExSchemaModel and ShaclSchemaModel) which can hold the schema and would be equipped with special serializations. Then the result structure could be exposed as a Model too, potentially with special serializations. That would let me avoid exposing any part of the Shaclex RDF impl "backwards" into Jena. A user would see only Jena Models.

labra commented 7 years ago

Yes, that makes sense.

I agree that an idiomatic way in Jena would be to define a ValidationModel with those subtypes, so a user would not need to be exposed to the internals of Shaclex and having those subtypes it is easier to work on them independently.

I wonder what would be the result of method validate. It could be a ´ResultShapeMap` or maybe, it could be another model representing the result shape map (which could contain the SHACL validation report also).

More info about the current design:

One requirement for Shaclex is to be independent from Jena so it can be adapted to other libraries like rdf4j or banana-rdf in the future.

To that end, the srdf module represents a simple rdf interface with the methods that are needed for validation. At this moment, both ShEx and SHACL validators only calls methods from srdf.

The srdfJena module is the module that implements srdf (at this moment it is the only one but I am planning to add more). It contains RDFAsJenaModel that implements RDFReader and RDFBuilder.

I wonder what would be the best namespace to define ValidateModel...it should probably be added in srdfJena because it depends on Jena...or maybe another module called jenaValidator or something like that.

About your previous question, if you want to work on this ticket, great. I think it can be done in parallel.

ajs6f commented 7 years ago

I appreciate the need for Shaclex to remain independent from any particular RDF framework. That's non-negotiatible. Seems like banana-rdf is a particularly great choice, because it gives you all the usual frameworks "for free".

Here is a potential tactic: what if I pursued this work from a separate codebase entirely (i.e. separate from Jena and from Shaclex), but with dependencies to both?

My thinking here is that most people who will want to use this kind of functionality (that is, to use validation with Jena) will be starting from Jena, not from Shaclex. Perhaps, as the projects proceed, we might be able to bring a potential jena-validation module over to Jena (of course, that would require a full discussion within and commitment from Jena), and in any event with this approach we do not need to find a place to add this stuff in Shaclex itself, which remains independent and concentrated on the real meat of the matter-- validation itself (not glue code! :grin:). Does that sounds reasonable? If so, I will set up a jena-validation project in a public Github repo and I can begin work.

labra commented 7 years ago

Yes, I was also thinking about that alternative as well. It also makes sense and as you say, it could provide cleaner code and less dependencies for Shaclex.

Another possibility is that you could even develop that library entirely in Java. It would probably make more sense and most people would be more familiar with Java than Scala.

If you go ahead, let me know and I can help bridging the gap between both.

I had created some time ago a simple Java based project to show how Shaclex could be called from Java here but I don't actively maintain it.

ajs6f commented 7 years ago

Cool, I've thrown something up here. That example looks really helpful, thanks! I need to do some Jena work, but I should be able to get back to this (after studying your example) sometime this coming week.

jadelkhoury commented 7 years ago

A separate java library would be nice indeed. FYI, we have developed a Java Validation library based on Shaclex, as part of the Eclipse Lyo project. (in the process of releasing it once we have the IP process done). See https://git.eclipse.org/r/#/c/101273/6 But returning validation results as Jena models is still relevant, and would improve our library, so I hope we can help contribute here as well.

@berezovskyi @yashskhatri! Any tips on how we call Shaclex from java? I don't believe we used https://github.com/labra/shexjava in the end.

jadelkhoury commented 7 years ago

Note on the Validation library I just mentioned ... A big chunk of the contribution in the library is to marshal/unmarchal Java POJOs to Jena models, based on SHACL annotations. But nevertheless, maybe there is something in that library that can be of help?

jadelkhoury commented 7 years ago

@yashkhatri!

berezovskyi commented 7 years ago

@ajs6f @jadelkhoury I didn't know that having a ValidationModel implement a Jena Model was the idiomatic way. I think we will just follow @ajs6f's lead here and lyo-validation should in the end rely on jena-validation. The way we do validation right now is quite direct: https://git.eclipse.org/r/#/c/101273/6/org.eclipse.lyo.validate/src/main/java/org/eclipse/lyo/validate/impl/ValidatorImpl.java@188

ajs6f commented 7 years ago

Okay, I should be able to find some time to tinker with this tomorrow. @berezovskyi; as far as "idiomaticity", I'm taking my cue (as mentioned above) from how Jena integrated inference functions. But I and the other committers are always open to new ideas!

ajs6f commented 7 years ago

Taking a look at this, I am trying to figure out what is and is not necessary to expose on the Java side. There are some types and methods in the Shaclex API that are undocumented and that I do not understand:

Do I need to just go and read the docs for ShEx and SHACL very carefully, or is it possible to give some hints about these guys? Obviously, I can just read the codebase very very carefully, but with my newbie Scala skills, that's going to take a long time... :)

ajs6f commented 7 years ago

As said above, I'm having a really hard time getting any grip on this without understanding Schema.validate. Is it possible to get some explanation of the parameters for that method? It is to be found in the SHACL recommendation?

ajs6f commented 6 years ago

@labra et al., just bumping this because I have some time in the next few weeks to make progress here, but I cannot so do without some insight into what the various types in shaclex do. I don't know which of the parameters of Schema.validate I should or could reexpose in Java or what some of them even mean... as I said, my Scala skills are minimal, so if I have to reverse-engineer the semantics, I'm not going to get very far very fast with it. Is there any documentation I can read?

labra commented 6 years ago

Thanks, you are right that we need to improve the documentation. I have started creating some documentation about the architecture and the modules of the library, but it is incomplete.

These days, I was quite busy preparing the online version of the Validating RDF data book which can also be useful to have some documentation about the languages.

For example, this section describes the SHACL validation report and this chapter compares both languages.

In order to interact with the library using Jena Models, one issue is to represent as a Jena Model the validation results.

My preference is to be able to have RDF representations of Shape Maps (used by ShEx) which can also be compatible with the RDF representations of Validation Reports (used by SHACL).

ajs6f commented 6 years ago

@labra Thanks so much for responding! I have been recommending Validating RDF, but I hadn't noticed that you were an author. Congrats!

In fact, I don't think we need to have results as a Model (for example, Jena doesn't offer SPARQL results as RDF, because that wouldn't make sense) , but it would certainly be nice. I will start as you advise, by working on RDF representations of the two kinds of results. I hope to have something to offer within a few weeks, depending on many other things. :grin: Happy New Year!

ajs6f commented 6 years ago

I hadn't looked at SHACL very much before, but it seems to me that we could mint a new predicate someNS:conforms < sh:conforms and use it to represent shape maps e.g. (using the example from your book):

:alice@:User,
:alice@:Employee,
:bob@:User

=>

:alice someNS:conforms :User .
:alice someNS:conforms:Employee .
:bob someNS:conforms :User .

and also use it instead of using sh:conforms in SHACL validation reports. Then to a first order (determining compliance or lack thereof) the same RDF is in use. It's not as clear to me how to represent the richer information in a common way, but perhaps we can find another commonality between ShEx's reason and SHACL's sh:resultMessage?