Open nsteins opened 4 years ago
This looks very useful, although I wonder if EventSeries could just a special case of TimeSeries. Using your example, each service request might be represented as a TimeSeries with two points.
service_call_event = traces.TimeSeries(default=0)
service_call_event[pd.Timestamp('2019-07-17 11:56:40')] = 1
service_call_event[pd.Timestamp('2019-07-30 13:14:54')] = 0
Suppose if you have the list of all service calls in a list named service_call_list
where each event is a TimeSeries with 2 points, then your cumsum function might be the same as a merge operation:
active_events = traces.TimeSeries.merge(service_call_list, operation=sum)
All that said, I guess that this way of processing the data would be far less efficient than your method.
I have a device that flashes according to a timetable. It reports a "commencement" event when it starts flashing and a "cessation" event where it stops. I'm looking into a method to represent the state on a timeline by creating a TimeSeries for that state and adding a value of 1 for each commencement and a value of 0 for each cessation. I'm also trying to represent the device's timetable as a time series for the desired state, with a value for 1 for when it should start flashing and 0 for when it should stop flashing. With this method I can use a xor operation to generate a plottable time series of all the times that the desired state didn't equal the actual state.
I like your time_lag function because I want to work out the total amount of time that my actual flashing state didn't match with the desired state. However, now that I have a TimeSeries where y=1 for any time that the actual state didn't match the desired state, maybe that function can be performed by existing operation as well. @devs, Histogram.total() calculate the area under the curve?
You are correct that you could represent this as a TimeSeries, and in fact, that was my first approach to modeling this kind of data. It's just slow because traces.TimeSeries.merge
iterates through the entire SortedDict
on every insertion.
Ah. Understood.
I feel like event_series is just a list of events, rather than something that fits into the library.
A faster way to build a timeseries could be
ts = traces.TimeSeries(default:0)
for row in df:
ts[df['CREATED_DATE'].dropna()] = 1
ts[df['CLOSED_DATE'].dropna()] = -1
A cumulative sum function could be an awesome addition to the api
cumsum_trace = traces.TimeSeries(default:0)
cumsum = 0
for k, v in ts.items():
cum_sum += v
cumsum_trace[k] = cumsum
As for feature requests, it could be cool if there was a function get_events(self, start_signal, end_signal)
that returned a list of "events". Given (key, value) pairs in a time series, each event will have a start (key when value == start_signal) and an end (key when value == end_signal).
I think that EventSeries fits in with Traces because it tries to follow a similar design and API to TimeSeries. There are obviously many ways to accomplish this, but I often found myself frustrated trying to accomplish this with pure pandas, and unable to do a lot of the things I wanted to with TimeSeries.
The main difference is that TimeSeries are designed around a model of an irregularly sampled continuous signal. I'm not sure what physical quantity a cumulative sum function would correspond to for a general TimeSeries.
Could you explain the get_events(self, start_signal, end_signal)
request a bit more?
I think it could be nice to have a function that transforms a timeseries into a list of periods (each with a start and end time or a start time and duration) based on the values.
You can then answer questions like "provide a list of periods where a light was switched on" or, using the shopping cart example from the docs, "provide a list of periods where the user had apples in their cart".
start_signal
and end_signal
could be functions so that it works on non-numeric traces.
Hey @nsteins, coming here from #227. Are you working on this? The feedback was short but I think this would be a great addition to the library, as an EventSeries equally falls into the task traces tries to solve: Handling time series. The fact that there are these two main classes makes EventSeries quite logical. @stringertheory came to the same conclusion in #227
Any timeline for this or questions you still want to discuss? I guess that would be easiest managed in a preliminary PR.
Proposing a new class for Traces
EventSeries
for handling data that is a series of timestamps denoting the occurrence of discrete events. For example this collection of 311 requests in Chicago, where each record is a request that has a timestamp for when it was opened and when it was closed. This is a fit for Traces because it is another example of unevenly-spaced time series and can usetraces.TimeSeries
for certain calculationsAn example of how the API might look
Event series could tell you the amount of events that occured between two arbitrary timestamps
EventSeries would also have a cumulative sum function which returns a
TimeSeries
of the cumulative number of events that have occured since the first recordFor events that have a "open" and "close" time stamp,
EventSeries
can calculate the number of active open casesFinally,
EventSeries
can calculate the inter-event arrival times and create visualizations for analysisI am already working on implementing this, but I would appreciate feedback and suggestions on API or features. Particularly interested if this can be extended to support the use case outlined in this issue https://github.com/datascopeanalytics/traces/issues/227