Blank node as the name of a streamed graph

keski commented 6 years ago

From 2.3 Timestamped Graphs:

There is exactly one named graph pair <n, G> in the RDF Dataset
(where G is an RDF graph, and n is an IRI or blank node).

This means that the name of the graph in the streamed element/dataset (I'll call it an event from here) can be represented as a blank node, e.g. as Trig:

_:b { :John :isIn :Room1 } .
_:b :observedAt "2017-08-16T16:35:00Z" .

However, blank nodes are always locally scoped to the file or RDF store (or in this case the streamed element), which effectively means that a stream using blank nodes can't contain references to other events in the stream, e.g. if the intention is:

_:b0 { :John :isIn :Room1 } .
_:b0 :observedAt "2017-08-16T16:35:00Z" .

_:b1 { :John :isIn :Room2 } .
_:b1 :observedAt "2017-08-16T16:35:05Z" .
_:b1 :after _:b0 .

_:b2 { :John :isIn :Room3 } .
_:b2 :observedAt "2017-08-16T16:35:10Z" .
_:b2 :after _:b1 .

but each element is streamed separately the labels of the blank nodes don't apply. I'm not saying that we should remove the alternative of having a blank node as the name of a graph but I'm not sure we've covered the implications of actually doing so. For example, from the 3.3.2 Immutability and Event Derivation in the RSP Requirements Design Document:

For RSP this means: (1) create a new (unique) graph for the derived event and (2) possibly
link back to the base event(s) thus enabling drill-down or root cause / provenance analysis
of the derived event.

Is (2) possible under the assumption that the streamed event is referenced using a blank node?

keski commented 6 years ago

After reading the RDF 1.1 Concepts and Abstract Syntax I feel inclined to leave the definition as it is:

An RDF dataset is a collection of RDF graphs. All but one of these graphs have an associated IRI
or blank node. They are called named graphs, and the IRI or blank node is called the graph name.
The remaining graph does not have an associated IRI, and is called the default graph of the RDF
dataset.

An approach to solve (2) above would be to have a query "rename" the events to produce a stream of (derived?) copies of the events that do have IRI identifiers and use that stream instead (similar to an event enrichment/decoration step). Another approach would be go with the reasoning route and use e.g. owl:sameAs... Maybe we could simply do:

_:b0 { :John :isIn :Room1 } .
_:b0 :observedAt "2017-08-16T16:35:00Z" .
_:b0 owl:sameAs :my-unique-event-id .

since this would not go against the idea of event immutability (due to the produced blank node having a different graph name)?

lisp commented 6 years ago

did you see the discussion of surfaces later in the document?

keski commented 6 years ago

Thanks. I had missed the surface discussion, really clarifies blank node scoping. So, each stream will be considered to be on a single surface and more than one stream can be considered to exist on the same surface (i.e., streams can share blank nodes).

Just for clarification, in the note:

An RDF stream is viewed as being on a single "RDF surface"(see [BLOGIC]), so that blank nodes
may be shared between any graphs in the stream.

Here graph here refers to timestamped graph, right? So, then the name of graphs is included in this mapping. That clarifies things for me.

(Out of curiosity: Has there been any work on serialization formats for the RDF surface distinction?)

lisp commented 6 years ago

to my knowledge nothing which would tend towards a standard.

we have conventions in our service as to how far a surface extends and i expect that other stores have their own, but i know of nothing which proposes any standard gsp or sparql query extension.

streamreasoning / RSP-QL

Blank node as the name of a streamed graph #84