Closed rubensworks closed 4 years ago
The goal of my proposal to allow undefined
in the termType
was to use only the Quad
interface and not require a QuadTerm
interface. RDF is simple and RDF* is almost as simple. If possible I would like to keep the number of interfaces low. My thoughts from two perspectives:
From developers that use the interfaces:
QuadTerm
interface besides the Quad
interface 👍 Quad
implements Term
is a little bit hidden 👎 From developers that implement the interface:
termType
and therefore must validate the value already. That happens outside the data model implementation. e.g. Variable
shows up at a serializer 👍 I only see one drawback, but a separate section about RDF* could cover that point.
use only the Quad interface and not require a QuadTerm interface
I don't immediately see any downsides of that approach. I'll wait for @RubenVerborgh's review before modifying this PR.
I only see one drawback, but a separate section about RDF* could cover that point.
A separate section dedicated to RDF* is a good idea, will add.
I reviewed, but my comments seem orthogonal to @bergos'. I'm also not sure I fully follow; could we write out some of the differences for better comparison?
I would prefer composition rather than inheritance, i.e. having QuadTerm
a Term with the quad as its value and "quad" as termType. This approach would also not require the ugly undefined
and make the design generally less complex.
Thank you all for your comments.
I played around with different possibilities in TypeScript, and it felt like a composition-based led to the most straightforward solution (thanks for the reminder @retog).
I have pushed a reworked version of the spec, and here you can find the corresponding typings (which can be pushed once if proposal PR is accepted): https://github.com/rubensworks/DefinitelyTyped/commit/955b974ce991440d651ad8bf3ffd7534705f2e15 Please let me know what you think.
@rubensworks I think the interfaces don't match with the RDF model. The Quad
is the Term
and not the value of a QuadTerm
. We had a similar discussion about the NamedNode
interface. A NamedNode
is a Term
that has an IRI
as the value. The discussion can be found in #50. I think also here we should stay consistent with the rest of the interfaces and the RDF model.
As @RubenVerborgh proposed I created two PRs for better comparison. #164 defines a new interface, #165 extends the Quad
interface.
I still favor the one with the extended Quad
interface. It's small, simpler, and easier to understand for developers that use the interfaces. As I mentioned earlier we can have two different perspectives on the specification: Developers that use the interface or developers that implement the interface. As I don't expect there will be a separate primer, it would be good to have a low entry barrier. #165 would only require to know the RDF model, but not the RDF* model to understand the basic concept of all interfaces.
Thanks for the additional PRs @bergos.
The Quad is the Term and not the value of a QuadTerm
I may be misunderstanding #50, but this feels like a different problem. I see no issue in making Quad
the value of a QuadTerm
. This is similar as to how the datatype of a Literal
is defined.
The additional datafactory method in this PR can be seen as an advantage IMO, as it makes RDF* a clear opt-in. In any case, I don't have a strong preference, I would also be fine with #165. (#164 seems a bit confusing).
@rdfjs/data-model-spec @retog Any other comments on this PR and #164+#165? As said before, I have no strong preference.
I hope we can decide on an approach soon, as we have some students starting this month, and I hope to let them work on some RDF* tooling, which would benefit all of us.
I think I be in favour of #165.
<<:bob foaf:age 23>> ex:certainty 0.9
This appears to closely match the Quad being a Term standpoint
const certianityQuad = {
subject: {
subject: bob,
predicate: foaf.age,
object: 23
},
predicate: ex.certainty,
object: 0.9
}
@rubensworks
I may be misunderstanding #50, but this feels like a different problem. I see no issue in making Quad the value of a QuadTerm. This is similar as to how the datatype of a Literal is defined.
I mentioned it in the context of keeping the structures of the RDF model also in the RDF/JS structures. The RDF* model makes the Quad
a Term
and doesn't introduce a new kind of Term
with a Quad
as the value. Like the Literal
, a Quad
has properties with known names.
The additional datafactory method in this PR can be seen as an advantage IMO, as it makes RDF* a clear opt-in.
Libraries are always free to export different factories. On purpose, the spec doesn't cover that part.
I think this is the way to go; it also aligns with Jena: https://jena.apache.org/documentation/rdfstar/
Proposal: let's start with this option, marked as experimental, and get implementer feedback.
Let's compare from proposed usage perspective. Code to create a quad like above
<<:bob foaf:age 23>> ex:certainty 0.9 .
With this
approach:
import { namedNode, literal, quad, quadTerm } from '@rdfjs/data-model'
const bobAgeQuad = quad(
namedNode(bob),
foaf.age,
literal('23', xsd.int)
)
const quad = quad(
quadTerm(bobAgeQuad),
ex.certainty,
literal('0.9', xsd.decimal)
)
console.log(`
<<
${quad.subject.value.subject} ${quad.subject.value.predicate} ${quad.subject.value.object}
>> ${quad.predicate} ${quad.object}`
)
Compared with changes proposed in #165
-import { namedNode, literal, quad, quadTerm } from '@rdfjs/data-model'
+import { namedNode, literal, quad } from '@rdfjs/data-model'
const bobAgeQuad = quad(
namedNode(bob),
foaf.age,
literal('23', xsd.int)
)
quad(
- quadTerm(bobAgeQuad),
+ bobAgeQuad,
ex.certainty,
literal('0.9', xsd.decimal)
)
console.log(`
<<
- ${quad.subject.value.subject} ${quad.subject.value.predicate} ${quad.subject.value.object}
+ ${quad.subject.subject} ${quad.subject.predicate} ${quad.subject.object}
>> ${quad.predicate} ${quad.object}`
)
IMO the simplicity speaks in favour of #165
Simplicity can be added in layers above; RDF/JS is about low-level library interop, where we need the most solid model.
However, can we have best of both worlds? Why don't we make a quad always a term? I don't like the optional fields in #165, they make things messy as per my first sentence here.
RDF/JS is about low-level library interop
I don't think this is true. RDF(/JS) is the lingua franca of all tools, both high and low level. I find myself using the data factory every time in different places of the applications I implement. It is important to me if it will require a new quadTerm
import and additional .value
to access the RDF*-reified subjects
Why don't we make a quad always a term?
From #165
- interface Quad {
+ interface Quad : Term {
"always term", no?
RDF(/JS) is the lingua franca of all tools
Lingua franca, which means that any abstraction built on top of RDF/JS will work with any of the libraries.
Hence, the goal is to find a solid and minimal interoperable set, not necessarily the simplest interface. That's up to other libraries.
Lingua franca, which means that any abstraction built on top of RDF/JS will work with any of the libraries. (...) That's up to other libraries.
I may have not been precise enough.
It is not only high level tool that you use the RDF/JS APIs. The RDF model easily surfaces to all layers regardless of the libraries used. What we do here I would compare to attempting to change JSON itself in non-RDF land. I opt for least impact. #165 proposes least invasive changes
the goal is to find a solid and minimal interoperable set
Could you elaborate what you don't like about #165? (other than the optional termType
and value
which I agree with you and commented)
Here's my take, expanding on what @bergos laconically mentions above. In RDF* a quad can be the subject of quad. Simple as that, the quad itself is a term. This is most accurately represented in that PR. Not a term, whose value is a quad.
other than the optional
termType
andvalue
which I agree with you and commented)
That seems to be integral to the design; so yes, that.
I think this is the way to go; it also aligns with Jena: https://jena.apache.org/documentation/rdfstar/
That's just the SPARQL result, it's way more complicated in the Jena API and we should not try to align to it...
Why don't we make a quad always a term?
I don't have a problem with it, I would even implement it that way. I had the impression that some people want to be able to explicitly say a Quad is not a Term, if they don't use RDF*, but I don't see any drawbacks to always make a Quad a Term. The only benefit would be to be backward compatible, but then there are libraries implementing version 1.0 of the spec and others 1.1. Should not be a problem.
To recap, this PR initially looked like #165, but I changed this to be based on composition (following @retog's suggestion) after playing around with it in TS. The composition approach felt more robust, as it did not have the problem of optional fields on terms.
With this composition approach, you for example do not have any problems if you want to create RDF statements based on quads that were obtained from some source that is out of your control, and makes use of an old datafactory (not supporting RDF):
const quads: RDF.Quad[] = obtainQuadsFromSomewhere();
const quadTerm = myDataFactory.createQuadTerm(quads[1]);
myDataFactory.quad(quadTerm, ...);
With the approach from #165, this would not be possible without re-creating the quad.
That's just the SPARQL result, it's way more complicated in the Jena API and we should not try to align to it...
Jena's Model API for RDF* on the bottom of that page doesn't look that complicated to me:
Statement Resource.getStatement()
Resource Model.createResource(Statement)
Resource ResourceFactory.createStatement
Alignment to this composition approach definitely makes sense to me.
With this composition approach, you for example do not have any problems if you want to create RDF* statements based on quads that were obtained from some source that is out of your control
Why is this a problem exactly? The quads are immutable so it doesn't really make much of a difference.
And from pure memory consumption/allocation it is also hardly relevant. Either way the RDF* factory will have to create new objects for every reified quad. For every quad subject It will be either a QuadTerm
or re-packaged Quad
.
Unless we count down to individual allocated bytes, I'd say that the practical difference in negligible
The composition approach felt more robust, as it did not have the problem of optional fields on terms.
As I mentioned before, we can also make them mandatory. termType
is fixed Quad
and value
fixed an empty string, just like we defined it for the DefaultGraph
.
With this composition approach, you for example do not have any problems if you want to create RDF statements based on quads that were obtained from some source that is out of your control, and makes use of an old datafactory (not supporting RDF):
That's the reason why we pass around factories.
Jena's Model API for RDF* on the bottom of that page doesn't look that complicated to me:
I went down the rabbit hole and it looks like everything is stored in a property called label
. It is way more complicated and the factory concept is different from RDF/JS. Why should we align with a typed language if we are more free to align it to the actual model? I see that contrary to the initial idea of having a idiomatic API for JavaScript.
Closing in favor of #165.
This PR includes an initial draft for adding RDF* support as discussed in #162.