rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
714 stars 130 forks source link

feat/rdfstar support #311

Closed jeswr closed 1 year ago

jeswr commented 1 year ago

This PR adds full support for RDF-star with all spec tests passing.

Closes #272 Closes #304 Related to #256 (What I consider lacking for that to be closed is specialised indexing for nested triples, and a match method that allows you to match based on patterns in nested triples).

jeswr commented 1 year ago

@RubenVerborgh this is ready for review but I don't have permissions to mark you as a reviewer.

jeswr commented 1 year ago

Thanks for the catch @TallTed - this PR essentially is changing over from implementing RDF* to RDF-star. I'd done all but a ctr+f to rename at the end :).

Also to summarise the contents of this PR:

What is not done:

TallTed commented 1 year ago

Update the store to handle deeply nested quads.

Note that, as of this writing, the RDF-star extension to RDF (and its serializations) is focused on triples, not quads, so this update may need reconsideration.

jeswr commented 1 year ago

Note that, as of this writing, the RDF-star extension to RDF (and its serializations) is focused on triples, not quads, so this update may need reconsideration.

My use of the quad terminology arises from the fact that the primitive in RDF/JS is a quad and not a triple. So in this PR the convention is that any nested triple (as referred to by the spec), is emmitted as a nested RDF/JS quad where the graph term MUST be the default graph.

This is also enforced in the parsing changes in this PR where nested graph terms cause errors. However given that the store currently already supports things outside of the spec (e.g. literals as subjects/predicates); I have made the design decision to not enforce this requirement on store operations; that is, nested quads in the store may contain graph terms that are not the default graph.

As I have already discussed with @rubensworks the main thing that needs to be done is to align in the RDFJS standard on a way of representing nested triples. In my view there are 3 ways of going about this:

  1. Make the graph term optional in RDFJS quads, and do not include it in quoted triples (this would be breaking) or allow the graph term to be set to null (also breaking).
  2. Agree that all quoted triples should be represented as RDFJS quads with the DefaultGraph set for the graph term
  3. Agree that all quoted triples should be represented as RDFJS quads with the graph term the same as that of the top level quad.

To give a concrete example then if we have the following nquads statement

<<:a :b :c>> :p :o :g

would, in RDFJS, become

under option 1

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      null,
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)

under option 2 (and as is currently implemented in this PR)

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      DEFAULT_GRAPH,
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)

under option 3

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      namedNode('http://example.org/g'),
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)
TallTed commented 1 year ago

given that the store currently already supports things outside of the spec

Building to what "[a] store ... supports" rather than to what a spec requires tends to do horrible things for interop. Further, to my mind, I can't see how not building to the RDF-star spec (even as it's still a draft) cannot leave something to be desired in what you're calling "full support for RDF-star".

But perhaps I'm missing something and/or these trade-offs are OK with the community around "[this] store". I suppose time will tell.

jeswr commented 1 year ago

But perhaps I'm missing something and/or these trade-offs are OK with the community around "[this] store".

I think https://github.com/rdfjs/types/issues/34#issuecomment-1331408661 summarizes it nicely. That is, the data is already validated when we do the data exchange (in particular when we are parsing it before adding it to the store, or serializing to send it somewhere else); so there isn't a need to apply additional validation at this processing/storage stage.

A good example of where enforcing such interfaces during data processing is problematic is in https://github.com/rdfjs/N3.js/pull/296 where the reasoner is working directly with the internal index and has zero knowledge of what types of terms the id represents.

Building to what "[a] store ... supports" rather than to what a spec requires tends to do horrible things for interop.

This store correctly implements the DatasetCore interface with the default generic parameters according to the RDFJS spec (https://github.com/rdfjs/types/blob/183bda795f57a9464ddf95deac45a0c4a48879cf/dataset.d.ts#L7).

woutermont commented 1 year ago

[I]n this PR the convention is that any nested triple (as referred to by the spec), is emmitted as a nested RDF/JS quad where the graph term MUST be the default graph. This is also enforced in the parsing changes in this PR where nested graph terms cause errors.

@jeswr, are there specific reasons why this should be enforced, i.e. why we cannot simply ignore the graph term of quoted triples/quads without erroring?

jeswr commented 1 year ago

are there specific reasons why this should be enforced, i.e. why we cannot simply ignore the graph term of quoted triples/quads without erroring?

There are specific test cases (e.g. https://w3c.github.io/rdf-star/tests/turtle/syntax/#turtle-star-bad-8) forbidding this in the spec tests. We could opt-into allowing graph-terms and be non-compliant; but I don't see the value in allowing & maintaining such a feature.

My stance is also that since parsing and serialising are part of the data-exchange process they should be completely compliant with whatever spec they are targeting in order to maintain interoperability between systems.

tpluscode commented 1 year ago

Ah, I see that the https://github.com/rdfjs/types/issues/34#issuecomment-1333452852 I just made does not align with the discussion here.

We do not want to allow graph in quoted triples after all?

woutermont commented 1 year ago

I do, however, and as I believe @rubensworks and @jeswr with me. In this PR, as I understand it, they are just constricted to the DEFAULT_GRAPH.

(Edit: Rewrote first sentence in to be more specific as to who supports this.)

jeswr commented 1 year ago

We do not want to allow graph in quoted triples after all?

When dealing with valid RDF-star statements; correct.

We do. In this PR, as I understand it, they are just constricted to the DEFAULT_GRAPH.

I am just implementing option 2 in https://github.com/rdfjs/types/issues/34#issue-1467371879. That is; we set the graph term in the quad to DEFAULT_GRAPH so it is a valid RDFJS quad, but we should just ignore the graph term when reading the nested/quoted triple/quad since it is not actually part of the quoted term in a semantic sense.

TallTed commented 1 year ago

@woutermont

We do. In this PR, as I understand it, they are just constricted to the DEFAULT_GRAPH.

I suggest you change that from We do to I do, because you certainly do not speak for all participants in this thread, nor have you presented any evidence that you are speaking for anyone other than yourself.

woutermont commented 1 year ago

I suggest you change that from We do to I do, because you certainly do not speak for all participants in this thread, nor have you presented any evidence that you are speaking for anyone other than yourself.

Yet @tpluscode does speak for everyone? Or do you have a bias towards me in your nitpicking, @TallTed?

TallTed commented 1 year ago

[@woutermont] Yet @tpluscode does speak for everyone? Or do you have a bias towards me in your nitpicking, @TallTed?

No, @tpluscode doesn't speak for everyone, either. (And if you'll peruse my activity history, even just within this PR, you should discover that my nitpicking is applied in many directions, pretty much all the time.) However, @tpluscode has generally been reflecting what others in this thread (or the related and connected other thread(s)) have also/already said, while so far as I've observed, you've been in a chorus of one.

(I do wish this were being hashed out in one place, preferably an issue under rdf-star, perhaps even RDF-star#275, the results of which discussion could then be applied to the implementations in N3.js and elsewhere.)

woutermont commented 1 year ago

@tpluscode has generally been reflecting what others in this thread (or the related and connected other thread(s)) have also/already said, while so far as I've observed, you've been in a chorus of one.

Then I'm afraid one of us is just reading the concensus in related threads wrongly 🤷‍♂️ Because I'm saying the same thing as @rubensworks and @jeswr.

Edit: For clarity, however, I'll change that sentence to a singular pronoun, and hide out interaction about it later tonight.

jeswr commented 1 year ago

But that could be an internal function then indeed! (c.f. https://github.com/rdfjs/N3.js/pull/311#discussion_r1065776281)

That's exactly what the follow up PR https://github.com/rdfjs/N3.js/pull/318 does :)

jeswr commented 1 year ago

Question: did the addition of getSplits surface any new bugs? I've been debating with myself for ages whether or not to get this; so far, I've relied on manual additions of cases. I do like the idea and I wonder how impactful it is.

It blew up with dozens of errors on the {| case you had already pointed out; and iirc it also complained at the first attempt at fixing it due to the behavior of a fall-through case.

There were not bugs beyond this - but having it pass on everything else definitely gives me confidence that the lexer is unlikely to be missing edge cases related to chunking.


I did also discover that the parser blocks on the following case (and there is a commented out test for this in N3Parser-test). I am not sure whether this is intended behavior or not.

it('should parse no chunks (i.e. onEnd called immediately)',
    shouldParseChunks([]));
jeswr commented 1 year ago

Some minor things in the lexer which I'll sort out myself; then this is good to merge.

Great! Let me know if there is any further work required for this PR on my end :)

jeswr commented 1 year ago

@RubenVerborgh - when you get around to revisiting this, it is probably worth setting https://github.com/rdfjs/N3.js/blob/520054a9fb45ef48b5b58851449942493c57dace/src/N3Parser.js#L31 to true unless there is a parameter explicitly opting out. Otherwise the parser doesn't play nice with tools like rdf-parse which do not recognise -star or -* content types.

benjaminaaron commented 1 year ago

I've used it like this and it looks perfect in the Turtle output:

const quad1 = quad(
    namedNode('http://example.org/a'),
    namedNode('http://example.org/b'),
    namedNode('http://example.org/c'),
)

const quad2 = quad(
    quad1,
    namedNode('http://example.org/d'),
    namedNode('http://example.org/e'),
)

Maybe it'd be worth adding a RDF-star example to the Readme? It might not be immediately clear that the way to do it is to use one quad within another one 🤔

jeswr commented 1 year ago

@benjaminaaron - I've added an example to the writing section :). Feel free to PR any other examples you want.

jeswr commented 1 year ago

@RubenVerborgh I've refactored the lexer and enabled rdfStar parsing by default.

Given that this has been open for a few months - I'd be inclined to merge as is and any other comments can be applied as patch releases? Same goes for https://github.com/RubenVerborgh/SPARQL.js/pull/160.