streamreasoning / RSP-QL

A home of RSP-QL syntax and semantics discussion
Apache License 2.0
18 stars 14 forks source link

Querying RDF graphs in windows #61

Closed keski closed 8 years ago

keski commented 8 years ago

Maybe this has been clarified somewhere but I feel that there is an assumption that although RDF streams consist of graphs, with some timestamp or similar, the graphs themselves may not be accessible in the windows. Basically, we've touched upon this previously and my understanding was that some argued against making the graphs available, and instead proposed that all streaming graphs would be put in "a default graph" which represents a window,. But to me the arguments for this view was not motivated well, other than "in this and this example we can manage without it". So, if possible I would like it to be clarified whether a query like the one below for filtering a stream would be valid:

# Assume that the graphs in this particular stream have more than one timestamp.
# The timestamp property used by the engine is :generatedAt. Now we wish to create
# a substream based on a specific time property (:observedAt) excluding all non-
# applicable events in the stream.

PREFIX : <http://examplel.org#>
REGISTER STREAM :filteredEventStream AS

CONSTRUCT ISTREAM {
   GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
   ?g :observedAt ?obj2 .
   ?g ?prop3 ?obj3 .
}
FROM NAMED WINDOW :w ON :fullEventStream [RANGE PT10S]
WHERE {
   WINDOW :w {
      GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
      ?g :observedAt ?obj2 .
      OPTIONAL { ?g ?prop3 ?obj3 . }
   }
}

Are there any arguments for why this query should not be valid? How would this query be expressed in a general way if the graphs cannot be referenced?

dellaglio commented 8 years ago

Hi Robin, I think we didn't take any final decision for the moment, as we are finalising first the work on defining RDF streams and their semantics.

I'd like to understand a bit more your query, in particular: 1) what is "?g1"? It does not appear in the WHERE clause, so I am wondering how it can be bound 2) does ?g mean: "any graph appearing in the window :w"? 3) Which is the data over which "?subj2 :observedAt ?obj2" should match? (same for the content of the optional clause)

keski commented 8 years ago

Hi Daniele, 1) and 3) There were some typos because I changed my example in the last second. Fixed. 2) Yes, "any graph appearing in the window :w"

Now, the query would match all graphs in :w (and all triples in each). The pattern "?g :observedAt ?obj2" would filter the stream to include only those graphs which have that property set. Finally, the optional would catch any other meta data (such as other timestamps). Basically, it would copy the events (graphs) in the stream that match the filter (reusing the graph ids since they are the same events) and include all the meta about the events.

keski commented 8 years ago

In a similar way we could define a merge (join) as:

# Join of two stream s1 and s2
PREFIX : <http://examplel.org#>
REGISTER STREAM :joinedStream AS

CONSTRUCT ISTREAM {
   GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
   ?g ?prop2 ?obj2 .
}
FROM NAMED WINDOW :w1 ON :s1 [RANGE PT10S]
FROM NAMED WINDOW :w2 ON :s2 [RANGE PT10S]
WHERE {
   {WINDOW :w1 {
      GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
      OPTIONAL { ?g ?prop2 ?obj2 . }
   }}
   UNION
   {WINDOW :w2 {
      GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
      OPTIONAL { ?g ?prop2 ?obj2 . }
   }}
}

In this query there are some details to be ironed out but this would at least be the principle in my mind. The neat thing about these queries is that the underlying graph structure in the streams doesn't have to be known since we can simply match everything in each graph.

Update: In a merge scenario we would also have to be able to make sure that ordering among streamed items is maintained, basically, synchronise the stream if one lags behind the other by e.g. a few seconds.

dellaglio commented 8 years ago

It's much clearer now, thank you. Given that at the moment we are not focusing on the syntax of the query language, I think it would be great to move the discussion at a requirement/features level.

If I got correctly, you are asking for some features, e.g.

In this context, I suggest to have a look at the requirements document and to check if they are already covered and, if not, to refine and add them.

greenTara commented 8 years ago

This discussion is also quite relevant for the abstract syntax and semantics document, because query results should arise from entailments (following the example of SPARQL), so to perform the sort of queries you describe, the named graph structure must be supported the semantics of the abstract syntax, which defines the entailments.

I have written the semantics of the abstract syntax so that the named graph structure is retained, within an RDF dataset, and the entailments follow from a particular specialization of RDF dataset semantics. So from this perspective, we keep the opportunity to perform the sort of queries that you describe above.

Tara

On Fri, Apr 15, 2016 at 1:45 AM, dellaglio notifications@github.com wrote:

It's much clearer now, thank you. Given that at the moment we are not focusing on the syntax of the query language, I think it would be great to move the discussion at a requirement/features level.

If I got correctly, you are asking for some features, e.g.

  • should be possible to access/query the annotations over the data stream items
  • should be possible to query the data items which annotations satisfy some contraints

In this context, I suggest to have a look at the requirements document and to check if they are already covered and, if not, to refine and add them.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/streamreasoning/RSP-QL/issues/61#issuecomment-210205961