RDF Stream Profile: Time Series

greenTara commented 8 years ago

Certain usecases or application domains do not need the full generality of the RDF stream definition, and so may be able to implement more efficient reasoning methods when the input is confined to be some subclass of RDF streams. It is common to call such subclasses "profiles" (e.g. OWL profiles RL, EL, QL). A new section of the Abstract Syntax and Semantics document should be devoted to defining and naming some important profiles.

greenTara commented 8 years ago

It appears to me that a "Time Series" profile, where each RDF stream has exactly one timestamp predicate, and the range of that predicate is xsd:dateTime is a commonly occurring special case.

dellaglio commented 8 years ago

Can we consider the totally ordered time series as another fragment, or is it just a subclass of time series?

Il giorno Mar 23 Feb 2016 16:03 Tara Athan notifications@github.com ha scritto:

It appears to me that a "Time Series" profile, where each RDF stream has exactly one timestamp predicate, and the range of that predicate is xsd:dateTime is a commonly occurring special case.

— Reply to this email directly or view it on GitHub https://github.com/streamreasoning/RSP-QL/issues/55#issuecomment-187732047 .

Sent from my Android phone. Please excuse my brevity.

greenTara commented 8 years ago

What is the property you have in mind for a totally ordered time series? Possibilities are:

the timescale of the timestamp predicate is totally ordered. This is the current meaning for "totally-ordered RDF stream". This could be a refinement of the above Time Series profile, provided the range of the predicate is restricted to, say, timezoned values of xsd:dateTime.
The ordering of the sequence is deterministic. That is, the timestamps in the stream are never repeated, and they are all comparable. Such a stream is only S-isomorphic to itself. There is some overlap. If the first property holds then all timestamps will be comparable, but it is an extra condition that timestamps are not repeated. As far as I am concerned, both of these properties could be required for the Time Series profile itself, if desired.

greenTara commented 8 years ago

Considering the algebraic properties of merge and union, these would be defined for Time Series that have the same timestamp predicate, so it would be a kind of sorted algebra. The first kind of "totally ordered" above would not affect this behavior. The second kind would rule out some merge/unions, in the case when two time series have a timestamp in common.

greenTara commented 8 years ago

Another note regarding merge/union. It may be useful to distinguish

merge/union of streams, which would be needed for joint windowing or direct retransmission
merge/union of the RDF datasets of streams, which would be needed for joint querying (e.g. after individually windowing each stream, then merge/union for query)

dellaglio commented 8 years ago

I was thinking to the case where the timestamp predicate is a one-to-one relation between stream items and time (is bijective) and the range of the relation is totally ordered. If I am correct, it means both the conditions you described. I think that condition 2 is too strong for the general time series class (while 1 makes sense to me) but it still denotes a relevant (sub-)class of streams.

Il giorno Mar 23 Feb 2016 17:20 Tara Athan notifications@github.com ha scritto:

Considering the algebraic properties of merge and union, these would be defined for Time Series that have the same timestamp predicate, so it would be a kind of sorted algebra. The first kind of "totally ordered" above would not affect this behavior. The second kind would rule out some joins, in the case when two time series have a timestamp in common.

— Reply to this email directly or view it on GitHub https://github.com/streamreasoning/RSP-QL/issues/55#issuecomment-187770496 .

Sent from my Android phone. Please excuse my brevity.

beortner commented 8 years ago

I agree on the first profile of a single timeseries (total ordered timestamps), but for the resulting stream from multiple streams generated by the application of streams operations ( merging / union) I think we should relax the bijective condition to a surjective function or relax the totally ordered to partially ordered relation. (maybe this is a second profile / subclass)

greenTara commented 8 years ago

(Edited)

OK, so here is an updated proposal An "RDF time series" is an RDF Stream that:

uses exactly one timestamp predicate
the range of the timestamp predicate is xsd:dateTimeStamp
no two elements in the time series have the same graph name.

An "RDF distinct time series" is an RDF time series such that no two elements in the time series have the same timestamp.

Note: when I talk about the "range of the timestamp predicate", I am referring to the definition of the timestamp predicate, as in :p a rsp:TimestampPredicate, rdfs:range xsd:dateTime. not as in the set of timestamps that occur in a particular stream. It is important to consider all possible values, because we want to characterize the merge/union operations over all possible streams of a particular subclass. In the case of RDF time series, we should be able to say that it is possible to merge any two RDF time series that use the same timestamp predicate and have disjoint graph names to get another RDF time series. For this to be possible, the entire range of the timestamp predicate must be totally ordered, not just the timestamps that occur in a particular stream.

Regarding surjective/bijective mappings, the item 3. of the time series definition makes the relation from graph names to timestamps functional (i.e. a mapping). The extra condition in the definition of distinct time series makes that mapping injective. Surjective I don't know - over what set of values might one want the time series to be surjective (onto)? It is trivially surjective over its own set of timestamps. A further subclass might be distinct time series with a particular duration between timestamps (regular time series).

lisp commented 8 years ago

the range of the timestamp predicate is timezoned value objects of xsd:dateTime

this reads as if it could just require xsd:dateTimeStamp (https://www.w3.org/TR/xmlschema11-2/#dateTimeStamp)

greenTara commented 8 years ago

Thanks, I didn't know about that derived datatype. Perfect.

I edited the original.

greenTara commented 8 years ago

See pull request https://github.com/streamreasoning/RSP-QL/pull/59

greenTara commented 8 years ago

In the pull request, there are some subprofiles defined which introduce additional properties where there are no duplicate timestamps in the stream and where the timestamps are equally spaced. We should agree on the terminology of these subprofiles.

greenTara commented 8 years ago

Regarding names of the profiles, this is relevant. https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.tms.doc/ids_tms_018.htm

greenTara commented 8 years ago

Here is a usage of "synchronous data streams" which is about the relationship between the streams, not a property of a single stream. But still related to the modifier "synchronous". http://www.seas.upenn.edu/~sudipto/mypapers/kdd.pdf

greenTara commented 8 years ago

Googling gives me no relevant links for distinct time series, which gives the advantage that there is no existing meaning for this phrase.

lisp commented 8 years ago

was the second link intended to be to something other than the same ibm knowledge center page as the previous one?

i also find only insubstantial use of the "synchronous" term. one significant characteristics is that the effective timestamps are synchronized with some external time source. this means that in this context, the notion of the synchronous relation between two otherwise autonomous streams cannot apply as that, taken to the extreme, requires no absolute time location and no quantitatively defined interval.

greenTara commented 8 years ago

Indeed, the second link was meant to be something else. I have edited.

lisp commented 8 years ago

that second paper is not useful as it uses "synchronous" to characterise streams which are both correlated and synchronized with an external clock, but takes no care to set the two features apart even though its early text is clear that their principle concern is the correlation.

greenTara commented 8 years ago

Agreed, "synchronous" would seem to be available for us to use and set a meaning that suits us, and we could use that instead of "distinct" to indicate that each element of the stream must strictly follow the previous one, as in a synchronous computation, if that is preferred.

"Regular" seems to me to be best suited for the case when the timestamp can be serialized as an integer based on the assumption of a reference time and a time offset. It would still be okay to have an arbitrary number of stream elements associated with each integer, which allows the case of repetition of time stamp and also the important case of missing elements.

"Regular synchronous" would almost be the final subprofile that you describe, except that it would still allow missing values.

How about "perfect time series" for the case when there is exactly one stream element for each time stamp on a regular grid?

On Mon, Apr 4, 2016 at 11:48 PM, james anderson notifications@github.com wrote:

that second paper is not useful as it uses "synchronous" to characterise streams which are both correlated and synchronized with an external clock, but takes no care to set the two features apart even though its early text is clear that their principle concern is the correlation.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/streamreasoning/RSP-QL/issues/55#issuecomment-205511883

lisp commented 8 years ago

these provide useful insight into customary meanings for these words

it indicates that "synchronous" is not entirely freely available and that it implies correlation.

it is difficult to compare this usage to that in earlier comments, because the original concerns "streams" in which there can be just one event at any given location both the abstract and the concrete levels while rdf streams permit distinct events at the concrete level to correspond to the same abstract location. notwithstanding which, one possible interpretation is that, on the abstract level,

"isochronous asynchronous" corresponds to "regular" : "events occur regularly", but no external correlation
"isochronous synchronous" corresponds to "perfect" : regular and correlated
"anisochronous synchronous" corresponds to "regular synchronous" : irregular, as some may be ansent, but correlated
"anisochronous asynchronous" corresponds to the unrestricted stream form : irregular, without correlation

greenTara commented 8 years ago

On further reflection, I think "synchronous" is not the best term because of the potential for confusion with synchronization over multiple streams. Is the original proposal of "distinct" (analogous to SQL distinct operation that elimates duplicates) an acceptable alternative?

jpcik commented 8 years ago

I agree that 'synchronous' can be a bit misleading

greenTara commented 8 years ago

In order to have a concrete proposal for the call, I am going to now make a commit where the terms "distinct" and "regular" are used.

streamreasoning / RSP-QL

RDF Stream Profile: Time Series #55