w3c / json-ld-bp

JSON-LD 1.1 Best Practices Note
https://w3c.github.io/json-ld-bp/
Other
21 stars 7 forks source link

Add note on streaming #3

Closed rubensworks closed 4 years ago

rubensworks commented 5 years ago

Following https://github.com/w3c/json-ld-bp/issues/4, we can add guidelines for achieving streaming JSON-LD, and any restrictions it may bring. Here is a summary of the issues discussed in https://github.com/w3c/json-ld-bp/issues/4.

Guidelines:

  1. The structure of a JSON-LD document to enable efficient streaming parsing to RDF.
    1. If there is an @context in a node, it should be the first key.
    2. If there is an @type in a node, and its value indicates a type-scoped context, it should come right after an @context (if there is one), or be the first key.
    3. If there is an @id in a node, it should be the first key if there is no @context or @type, or the second (or third) key if there is an @context or @type.
  2. The order in which RDF triples/quads should appear to enable efficient streaming serialization to JSON-LD.
    1. Quads with equal graphs should be grouped (Achieves grouping of @graph blocks).
    2. Quads with graph corresponding to the subject of triples/quads should be grouped (Achieves grouping of @graph and @id blocks).
    3. Triples with equal subjects should be grouped (Achieves grouping of @id blocks).
    4. Triples with equal predicates should be grouped (Achieves grouping of predicate arrays)

(triple stores may already do these kinds of grouping automatically)

Restrictions:

  1. Parsing
    • The spec allows @context to appear any in the document. As such, a strict parser may require buffering large portions of the stream. However, since most real-world JSON-LD documents place @context as first element, streaming parsers may reject any JSON-LD documents that have out-of-order @context's.
  2. Serialization
    • RDF lists are not converted to @list arrays, as you can only be certain that @list can be used once all triples have been read, which requires keeping the whole stream in-memory.
    • No deduplication of triples, as this would also require keeping the whole stream in-memory.
rubensworks commented 5 years ago

Following a comment from @BigBlueHat, scoped @context are also allowed, as long as they appear as first key inside their object/scope.

gkellogg commented 5 years ago

I think it's only @type which would be an issue for ordering and scoped contexts. Individual property's scoped contexts are handled when they are processed, and all context information will already have been processed.

So, the suggested ordering within an object (actually, node object or graph object), would be the following:

  1. @context – if any
  2. @id or alias – if any
  3. @type or alias – if any

No other properties (including @graph) need be ordered.

rubensworks commented 5 years ago

I don't think @type should be order-dependent. @type essentially expands to rdf:type, which makes it processable like any other property/predicate (as long as @id comes first).

Unless I'm missing something @gkellogg?

gkellogg commented 5 years ago

Scoped contexts can be triggered on @type as well as a individual property. If you don’t see @type early, you may misinterpret the properties.

rubensworks commented 5 years ago

Ah I see, I wasn't aware of that, thanks for clearing that up! (Link to example for reference)

ajs6f commented 4 years ago

Is this issue distinct from https://github.com/w3c/json-ld-bp/issues/4?

rubensworks commented 4 years ago

I would say this is the same as #4. (#4 was moved from another repo)

iherman commented 4 years ago

This issue was discussed in a meeting.

rubensworks commented 4 years ago

Everything mentioned here has now been written here: https://github.com/w3c/json-ld-streaming

So I suggest to close this issue.

BigBlueHat commented 4 years ago

Works for me! Thanks, @rubensworks!