streamreasoning / RSP-QL

A home of RSP-QL syntax and semantics discussion
Apache License 2.0
18 stars 14 forks source link

include time zone information in timestamps in examples #24

Open lisp opened 8 years ago

lisp commented 8 years ago

the current RGN_Location_TempC_Minute_Merged.json includes, for example, the following observations.

  "@graph": [
    {  "@id": "source:Berlin_1",  "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Paris_1",  "observedAt": "2015-01-01T01:01:00"  },
... ]

it is important to know whether a processor located in one of those cities would interpret this to be the "same" stream as one which contained

  "@graph": [
    {  "@id": "source:Berlin_1",  "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00+01:00"  },
    {  "@id": "source:Paris_1",  "observedAt": "2015-01-01T00:01:00Z"  },
... ]
greenTara commented 8 years ago

James- I agree this is important to clarify, and I'm glad you bring it up. There is a "discussion" comment in the Semantics document so far (https://github.com/streamreasoning/RSP-QL/blob/master/Abstract%20Syntax%20and%20Semantics%20Document/AbstractSyntaxAndSemantics.html) states " The predicate |p| should be drawn from a community agreed vocabulary" and refers to

https://github.com/streamreasoning/RSP-QL/issues/10

Part of the acceptance of a timestamp predicate into the community-agreed vocabulary must be the definition of the datatype associated with it.

We don't yet have this community-agreed vocabulary, so in these examples, I am using a placeholder timestamp predicate, with a (tacit) assumption that the datatype is xsd:dateTime (https://www.w3.org/TR/xmlschema-2/#dateTime). According to that datatype, timezoned and non-timezoned values are not comparable (within each of those two sets, there is a total order established by mapping to decimal numbers). So the graph you show below where some timezones have been added would not be the same graph as the original.

Your comment suggests to me that my example should be improved by changing the timestamps so that all values are timezoned values, and this should be a requirement of the predicate. This is especially true since this example is intended to represent the merger of data from different geographic locations, which might be in different timezones (although in this case they happen to be all the same). Then it will be possible to say that changing from 2015-01-01T01:01:00+01:00 to 2015-01-01T00:01:00Z gives a stream that is equivalent to the original stream.

However, I would not go so far as to say it is the "same" stream or, more formally, an "isomorphic" stream. That would not be consistent with the concept of identity of RDF graphs that has been in play since the first RDF specification.

I intend to define a concept of "stream-isomorphism" that will be an extension of the concepts of "isomorphism" (called "equivalence" in RDF 1.0) and "dataset-isomorphism" regarding RDF graphs and RDF datasets, resp. (https://www.w3.org/TR/rdf11-concepts/#section-dataset-isomorphism).

Would this clarification (regarding the datatype of the predicate and definition of isomorphism) and modification (regarding the example) alleviate your concerns on this issue?

The definition of stream-isomorphism needs to be part of the abstract syntax. There is also a need for concepts corresponding to (logical) equivalence, entailment, inconsistency, and entailment regimes for streams, extending these concepts for RDF graphs and datasets (https://www.w3.org/TR/rdf11-concepts/#entailment) which are properly part of the semantics of RDF streams. I am starting to think that this document needs two large sections, one for the Abstract Syntax and one for the Semantics, in keeping with the way RDF separates Concepts and Abstract Syntax (http://www.w3.org/TR/rdf11-concepts/) from Semantics (https://www.w3.org/TR/2014/REC-rdf11-mt-20140225/).

Tara

On 1/23/16 4:38 AM, james anderson wrote:

the current RGN_Location_TempC_Minute_Merged.json includes, for example, the following observations.

|"@graph": [ { "@id": "source:Berlin_1", "observedAt": "2015-01-01T01:01:00" }, { "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00" }, { "@id": "source:Paris_1", "observedAt": "2015-01-01T01:01:00" }, ... ] |

it is important to know whether a processor located in one of those cities would interpret this to be the "same" stream as one which contained

|"@graph": [ { "@id": "source:Berlin_1", "observedAt": "2015-01-01T01:01:00" }, { "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00+01:00" }, { "@id": "source:Paris_1", "observedAt": "2015-01-01T00:01:00Z" }, ... ] |

— Reply to this email directly or view it on GitHub https://github.com/streamreasoning/RSP-QL/issues/24.

lisp commented 8 years ago

Would this clarification (regarding the datatype of the predicate and definition of isomorphism) and modification (regarding the example) alleviate your concerns on this issue?

to the extent that the standard would limit itself to "stream isomorphism" where that were to restrict itself to term identity, not entirely.

If either $arg1 or $arg2 has no timezone component, the effective value of the argument is obtained by substituting the implicit timezone from the dynamic evaluation context.

  • your description, above, correctly distinguished between graph identity and equivalence (according to whatever regime). in order for the standard to support useful implementations of one of the most primitive operations - stream merge, it will have to permit the processors to set the comparison criteria for timestamps. for example, to recognize equality of non-identical zulu and time-zoned time values. for this it should be possible to communicate that information either in-band or out-of-band in order that two communicating processors can work with each other.

the recognition, that operations on timestamp values must allow for more than just identity, lends support to my suspicion, that the merge operation is not a matter of identity only. if some form of reasoning is required for the time values, why is it not permitted for the predicate terms. the discussion for #10 indicates, if nothing else, that the role of the predicate is best to be declared for a given stream than specified in a standard. given this, i remain concerned, that even a definition for graph equivalence which allowed for just value entailment regimes would not suffice.

lisp commented 8 years ago

for an example of the kind of problem which will arise during development, application and support of an rdf-based system, if the definitions are not in terms of value domains and/or those domains are not coherent, please see a coincidental note on the jena mailing list.

jpcik commented 8 years ago

Apart from the isomorphism and equality issues, I think this particular issue can be solved by adding the timezone to each timestamp right?

lisp commented 8 years ago

except for that, the manner in which the equality semantics contributes to isomorphism is exactly the issue. to just add the time zones, in particular, to make them agree, just avoids the issue by reducing the examples to the degenerate case where identity suffices. that approach neither comprehends the standard semantics for the dataTime value domain, nor adequately represents the issues which arise with that domain in actual use.

greenTara commented 8 years ago

@lisp The link you provide to the jena mailing list goes to a post talking about concatenating strings leading to something this is mistaken to be a typed literal, as far as I can tell. I don't see what that has to do with abstract syntax and semantics, that seems to be an issue at the level of concrete syntax. Perhaps you can explain in more detail what you want us to learn from that link.

Since the required temporal aspect is the key feature that distinguishes the concept of RDF stream from an ordinary RDF dataset, it is critical that we get these temporal issues right. As I understand the xsd:dateTime value space (https://www.w3.org/TR/xmlschema11-2/#dateTime), it is the union of two disjoint sets of "temporal entities", dateTimes with timezone and dateTimes without timezones. There appears to be an intuitive partial order in this value space, even between these two sets (as you pointed out earlier). For example, 2012-01-01T00:00:01 is clearly before 2016-01-01T00:00:01Z, even though one has an explicit timezone and the other doesn't. So it would be possible to clarify the partial order by saying that two xsd:dateTime values t1, t2, satisfy t1 <= t2 iff this temporal relation holds in every timezone.

greenTara commented 8 years ago

Sorry, accidentally hit the close button.

lisp commented 8 years ago

I don't see what that has to do with abstract syntax and semantics, that seems to be an issue at the level of concrete syntax. Perhaps you can explain in more detail what you want us to learn from that link.

that situation is one where the designers left the users with a muddled semantics - one in which things which are intuitively comparable and for which self-evident use cases would require a comparison semantics, are defined not to be comparable, with ensuing complexity, miscomprehension and wasted effort. the lesson for rsp is that temporal comparisons must be defined on domain values and must permit entailment.