Add experimental interfaces for RDF*

rubensworks commented 4 years ago

This PR includes an initial draft for adding RDF* support as discussed in #162.

bergos commented 4 years ago

The goal of my proposal to allow undefined in the termType was to use only the Quad interface and not require a QuadTerm interface. RDF is simple and RDF* is almost as simple. If possible I would like to keep the number of interfaces low. My thoughts from two perspectives:

From developers that use the interfaces:

Easy to understand because of the small number of interfaces that match almost exactly the RDF model 👍
RDF* can be found via search or maybe even a separate section 👍
No need to go into details and understand why there is a QuadTerm interface besides the Quad interface 👍
RDF* and the fact that Quad implements Term is a little bit hidden 👎

From developers that implement the interface:

They are experts, they know what to look for and they have to understand all details 👍
I guess they explicitly decide to support RDF* or decide against it, so no need to have two separate factories 👍
Many use cases support only a subset of the valid values of termType and therefore must validate the value already. That happens outside the data model implementation. e.g. Variable shows up at a serializer 👍

I only see one drawback, but a separate section about RDF* could cover that point.

rubensworks commented 4 years ago

use only the Quad interface and not require a QuadTerm interface

I don't immediately see any downsides of that approach. I'll wait for @RubenVerborgh's review before modifying this PR.

I only see one drawback, but a separate section about RDF* could cover that point.

A separate section dedicated to RDF* is a good idea, will add.

RubenVerborgh commented 4 years ago

I reviewed, but my comments seem orthogonal to @bergos'. I'm also not sure I fully follow; could we write out some of the differences for better comparison?

retog commented 4 years ago

I would prefer composition rather than inheritance, i.e. having QuadTerm a Term with the quad as its value and "quad" as termType. This approach would also not require the ugly undefined and make the design generally less complex.

rubensworks commented 4 years ago

Thank you all for your comments.

I played around with different possibilities in TypeScript, and it felt like a composition-based led to the most straightforward solution (thanks for the reminder @retog).

I have pushed a reworked version of the spec, and here you can find the corresponding typings (which can be pushed once if proposal PR is accepted): https://github.com/rubensworks/DefinitelyTyped/commit/955b974ce991440d651ad8bf3ffd7534705f2e15 Please let me know what you think.

bergos commented 4 years ago

@rubensworks I think the interfaces don't match with the RDF model. The Quad is the Term and not the value of a QuadTerm. We had a similar discussion about the NamedNode interface. A NamedNode is a Term that has an IRI as the value. The discussion can be found in #50. I think also here we should stay consistent with the rest of the interfaces and the RDF model.

As @RubenVerborgh proposed I created two PRs for better comparison. #164 defines a new interface, #165 extends the Quad interface.

I still favor the one with the extended Quad interface. It's small, simpler, and easier to understand for developers that use the interfaces. As I mentioned earlier we can have two different perspectives on the specification: Developers that use the interface or developers that implement the interface. As I don't expect there will be a separate primer, it would be good to have a low entry barrier. #165 would only require to know the RDF model, but not the RDF* model to understand the basic concept of all interfaces.

rubensworks commented 4 years ago

Thanks for the additional PRs @bergos.

The Quad is the Term and not the value of a QuadTerm

I may be misunderstanding #50, but this feels like a different problem. I see no issue in making Quad the value of a QuadTerm. This is similar as to how the datatype of a Literal is defined.

The additional datafactory method in this PR can be seen as an advantage IMO, as it makes RDF* a clear opt-in. In any case, I don't have a strong preference, I would also be fine with #165. (#164 seems a bit confusing).

rubensworks commented 4 years ago

@rdfjs/data-model-spec @retog Any other comments on this PR and #164+#165? As said before, I have no strong preference.

I hope we can decide on an approach soon, as we have some students starting this month, and I hope to let them work on some RDF* tooling, which would benefit all of us.

tpluscode commented 4 years ago

I think I be in favour of #165.

<<:bob foaf:age 23>> ex:certainty 0.9

This appears to closely match the Quad being a Term standpoint

const certianityQuad = {
  subject: {
    subject: bob,
    predicate: foaf.age,
    object: 23
  },
  predicate: ex.certainty,
  object: 0.9
}

bergos commented 4 years ago

@rubensworks

I may be misunderstanding #50, but this feels like a different problem. I see no issue in making Quad the value of a QuadTerm. This is similar as to how the datatype of a Literal is defined.

I mentioned it in the context of keeping the structures of the RDF model also in the RDF/JS structures. The RDF* model makes the Quad a Term and doesn't introduce a new kind of Term with a Quad as the value. Like the Literal, a Quad has properties with known names.

The additional datafactory method in this PR can be seen as an advantage IMO, as it makes RDF* a clear opt-in.

Libraries are always free to export different factories. On purpose, the spec doesn't cover that part.

RubenVerborgh commented 4 years ago

I think this is the way to go; it also aligns with Jena: https://jena.apache.org/documentation/rdfstar/

Proposal: let's start with this option, marked as experimental, and get implementer feedback.

tpluscode commented 4 years ago

Let's compare from proposed usage perspective. Code to create a quad like above

<<:bob foaf:age 23>> ex:certainty 0.9 .

With this approach:

import { namedNode, literal, quad, quadTerm } from '@rdfjs/data-model'

const bobAgeQuad = quad(
  namedNode(bob),
  foaf.age,
  literal('23', xsd.int)
)

const quad = quad(
  quadTerm(bobAgeQuad),
  ex.certainty,
  literal('0.9', xsd.decimal)
)

console.log(`
  <<
    ${quad.subject.value.subject} ${quad.subject.value.predicate} ${quad.subject.value.object}
  >> ${quad.predicate} ${quad.object}`
)

Compared with changes proposed in #165

-import { namedNode, literal, quad, quadTerm } from '@rdfjs/data-model'
+import { namedNode, literal, quad } from '@rdfjs/data-model'

const bobAgeQuad = quad(
  namedNode(bob),
  foaf.age,
  literal('23', xsd.int)
)

quad(
- quadTerm(bobAgeQuad),
+ bobAgeQuad,
  ex.certainty,
  literal('0.9', xsd.decimal)
)

console.log(`
  <<
-  ${quad.subject.value.subject} ${quad.subject.value.predicate} ${quad.subject.value.object}
+  ${quad.subject.subject} ${quad.subject.predicate} ${quad.subject.object}
  >> ${quad.predicate} ${quad.object}`
)

IMO the simplicity speaks in favour of #165

RubenVerborgh commented 4 years ago

Simplicity can be added in layers above; RDF/JS is about low-level library interop, where we need the most solid model.

However, can we have best of both worlds? Why don't we make a quad always a term? I don't like the optional fields in #165, they make things messy as per my first sentence here.

tpluscode commented 4 years ago

RDF/JS is about low-level library interop

I don't think this is true. RDF(/JS) is the lingua franca of all tools, both high and low level. I find myself using the data factory every time in different places of the applications I implement. It is important to me if it will require a new quadTerm import and additional .value to access the RDF*-reified subjects

Why don't we make a quad always a term?

From #165

-    interface Quad {
+    interface Quad : Term {

"always term", no?

RubenVerborgh commented 4 years ago

RDF(/JS) is the lingua franca of all tools

Lingua franca, which means that any abstraction built on top of RDF/JS will work with any of the libraries.

Hence, the goal is to find a solid and minimal interoperable set, not necessarily the simplest interface. That's up to other libraries.

tpluscode commented 4 years ago

Lingua franca, which means that any abstraction built on top of RDF/JS will work with any of the libraries. (...) That's up to other libraries.

I may have not been precise enough.

It is not only high level tool that you use the RDF/JS APIs. The RDF model easily surfaces to all layers regardless of the libraries used. What we do here I would compare to attempting to change JSON itself in non-RDF land. I opt for least impact. #165 proposes least invasive changes

the goal is to find a solid and minimal interoperable set

Could you elaborate what you don't like about #165? (other than the optional termType and value which I agree with you and commented)

Here's my take, expanding on what @bergos laconically mentions above. In RDF* a quad can be the subject of quad. Simple as that, the quad itself is a term. This is most accurately represented in that PR. Not a term, whose value is a quad.

RubenVerborgh commented 4 years ago

other than the optional termType and value which I agree with you and commented)

That seems to be integral to the design; so yes, that.

bergos commented 4 years ago

I think this is the way to go; it also aligns with Jena: https://jena.apache.org/documentation/rdfstar/

That's just the SPARQL result, it's way more complicated in the Jena API and we should not try to align to it...

Why don't we make a quad always a term?

I don't have a problem with it, I would even implement it that way. I had the impression that some people want to be able to explicitly say a Quad is not a Term, if they don't use RDF*, but I don't see any drawbacks to always make a Quad a Term. The only benefit would be to be backward compatible, but then there are libraries implementing version 1.0 of the spec and others 1.1. Should not be a problem.

rubensworks commented 4 years ago

To recap, this PR initially looked like #165, but I changed this to be based on composition (following @retog's suggestion) after playing around with it in TS. The composition approach felt more robust, as it did not have the problem of optional fields on terms.

With this composition approach, you for example do not have any problems if you want to create RDF statements based on quads that were obtained from some source that is out of your control, and makes use of an old datafactory (not supporting RDF):

const quads: RDF.Quad[] = obtainQuadsFromSomewhere();
const quadTerm = myDataFactory.createQuadTerm(quads[1]);
myDataFactory.quad(quadTerm, ...);

With the approach from #165, this would not be possible without re-creating the quad.

That's just the SPARQL result, it's way more complicated in the Jena API and we should not try to align to it...

Jena's Model API for RDF* on the bottom of that page doesn't look that complicated to me:

    Statement Resource.getStatement()
    Resource Model.createResource(Statement)
    Resource ResourceFactory.createStatement

Alignment to this composition approach definitely makes sense to me.

tpluscode commented 4 years ago

With this composition approach, you for example do not have any problems if you want to create RDF* statements based on quads that were obtained from some source that is out of your control

Why is this a problem exactly? The quads are immutable so it doesn't really make much of a difference.

And from pure memory consumption/allocation it is also hardly relevant. Either way the RDF* factory will have to create new objects for every reified quad. For every quad subject It will be either a QuadTerm or re-packaged Quad.

Unless we count down to individual allocated bytes, I'd say that the practical difference in negligible

bergos commented 4 years ago

The composition approach felt more robust, as it did not have the problem of optional fields on terms.

As I mentioned before, we can also make them mandatory. termType is fixed Quad and value fixed an empty string, just like we defined it for the DefaultGraph.

With this composition approach, you for example do not have any problems if you want to create RDF statements based on quads that were obtained from some source that is out of your control, and makes use of an old datafactory (not supporting RDF):

That's the reason why we pass around factories.

Jena's Model API for RDF* on the bottom of that page doesn't look that complicated to me:

I went down the rabbit hole and it looks like everything is stored in a property called label. It is way more complicated and the factory concept is different from RDF/JS. Why should we align with a typed language if we are more free to align it to the actual model? I see that contrary to the initial idea of having a idiomatic API for JavaScript.

rubensworks commented 4 years ago

Closing in favor of #165.

rdfjs / data-model-spec

Add experimental interfaces for RDF* #163