timescale / timescaledb-toolkit

Extension for more hyperfunctions, fully compatible with TimescaleDB and PostgreSQL 📈
https://www.timescale.com
Other
377 stars 47 forks source link

Counter Reset / Rate Aggs #9

Closed davidkohn88 closed 2 years ago

davidkohn88 commented 3 years ago

What's the functionality you would like to add

The counter reset problem outlined in #4 and the rate/delta problem outlined in #8 can be solved, at least in part by choosing an intermediate aggregate that allows us to do part of the work before having access to the full data set.

There are 2 parts to the solution, 1) aggregates with a properly designed partial forms that can at least be partialized for disjoint time ranges (ones specific to counters and gauges) and 2) a set of functions designed to work on these partial forms to return the rate/delta and other functions outlined in issue #8.

Proposed aggregates:

  1. counter_agg(value, time)
  2. gauge_agg(value, time)
  3. saturating_counter_agg(value, time, reset_val)
  4. each of these aggs with the form counter_agg(counter_agg_partial) (implicitly performing the combine function on the aggregate, for re-aggregation purposes)

Proposed functions to act on these aggregates:

n.b.: last_val is adjusted for resets if it is a counter_agg (and not if it is a gauge_agg)

  1. rate(counter_agg(value, time)) -> the simple observed rate, so (last_val - first_val)/(last_time - first_time)
  2. delta(counter_agg(value, time)) -> the simple observed delta of the value last_val - first_val
  3. time_delta(counter_agg(value, time)) -> the simple observed delta of the time last_time - first_time (useful for determining how well sampled a time range is
  4. extrapolated_rate(counter_agg(value, time), range, method) the rate, but with our values extrapolated out to somewhere near the edges of our bucket (the range provided), extrapolate using the method provided
  5. extrapolated_delta(counter_agg(value, time), range, method)
  6. instantaneous_rate(counter_agg(value, time), edge) -> the instantaneous rate ((val(n) - val(n-1))/(time(n)-time(n-1)) at either the leading or trailing edge of the range (default to trailing as that's Prom's behavior, though it seems a bit weird to time_bucket which provides the value at the beginning of the range, and then provide the value at the end?
  7. instantaneous_delta(counter_agg(value, time), edge) -> instantaneous delta analogous to above

Technical details on partials

The partial we create would be an intermediate form that stores:

  1. The first 2 and last 2 adjusted values within a time range (two values so that we can do the instantaneous calculations)
  2. the sum of any counter reset values within the range (where the reset_value = val_before_reset + val_after_reset, which is also equal to the adjusted_counter value at the last reset in the range)

ie for the example above:

first: 2020-12-02 21:47:50+00, 10 
second: 2020-12-02 21:57:50+00, 120
penultimate: 2020-12-02 22:37:50+00, 8950
last: 2020-12-02 22:47:50+00 , 11960
reset_sum: 8950 // 3045 + 205 + 5200 + 500 

We need the reset sum stored in order to determine if there was a counter reset when combining adjacent partials, so, if we were to combine this partial with another, we would use last - reset_sum as the raw_last value to compare to the first value of the next partial in order to determine if we had a counter reset at the partial boundary.

As long as the ranges are disjoint and don't have gaps, you can then order by the first value of the range and calculate the full counter reset by essentially doing the normal counter reset calculations with the first value of the current and the last value of the preceeding and then summing the reset_sums.

This form is quite efficient as it represents an entire range in only 9 values. If you have as few points as our example above, then it's not a huge savings, but when you are reporting more frequently, this can be a huge savings.

The key requirement here is that the time ranges (and any other groups) must be disjoint (non-overlapping, because overlapping time ranges would mean we can't get a consistent ordering and counter resets could easily be double counted) and that there shouldn't be gaps (ie you can't just take the data from last week and the data from today and expect to get the right results, you need the data in the middle or you may miss counter resets).

These requirements are met by two of our important cases: continuous aggregates and partitionwise aggregates. In each case, the aggregations are guaranteed to be on disjoint sections of data (partitions are guaranteed to be disjoint as long as you group by any hash partitioned keys, and counter resets really only make sense at the most granular level) and a) continuous aggregate partials have a disjoint time range and in fact cannot cross chunk boundaries. (The lack of gaps is basically defined by not doing weird things with your where clauses, but you could create similar problems if you did it on the raw data, so we'll leave that to the side for now).

The basic aggregate would buffer the data from a time range, and then, when asked to serialize, would order the data, perform the proper calculations and then serialize this compact form and it can be stored in a continuous aggregate or it can be sent back to the access node as part of a partitionwise aggregate.

How would the function be used

SELECT delta(counter_agg(foo, time)) FROM bar WHERE time >= t1 and time < t2  AND id = 'x'
WITH t as (
    SELECT counter_agg(foo, time) as foo_counter FROM bar WHERE time >= t1 and time < t2
) SELECT delta(foo_counter), rate(foo_counter), instantaneous_rate(foo_counter, 'trailing') FROM t

Continuous aggregate:

CREATE MATERIALIZED VIEW bar_hourly
WITH (timescaledb.continuous) AS
SELECT id,
       time_bucket(INTERVAL '1 hour', time) as bucket, 
       counter_agg(foo, time) as foo_counter,
       gauge_agg(baz, time) as baz_gauge
FROM bar
GROUP BY id, bucket

You can the perform re-aggregation without waiting for continous aggs to support re-aggregation by, running

WITH t as (
SELECT id,
       counter_agg(foo_counter) as foo_counter,
       gauge_agg(foo_counter) as baz_gauge
FROM bar_hourly
WHERE bucket >= t1 AND bucket < t2
GROUP BY id)

SELECT id, rate(foo_counter) as foo_rate, rate(baz_gauge) as baz_rate, time_delta(foo_counter) as foo_td ... etc.
FROM t

(This should also work without the with clause, you can nest the function calls if you want to in either case with something like SELECT rate(counter_agg(foo, time)) as foo_rate, delta(counter_agg(foo, time)) as foo_delta and it should only perform the aggregation once)

Why should this feature be added? Is your feature request related to a problem? A wishlist item? Please describe.

See #4 & #8

Drawbacks

Counter aggs are not fully parallelizable (I'm not sure there is a fully parallelizable solution). (Gauge aggs are)

Separate aggregates from the functions for evaluation might be slightly confusing to users.

Open Questions

Regression Functions

Should we have separate aggregates that allow for regression analysis like with Prom's deriv function? i.e.:

Proposed aggregates:

  1. counter_regr(value, time)
  2. gauge_regr(value, time)
  3. saturating_counter_regr(value, time, reset_val)
  4. each of these aggs with the form counter_regr(counter_regr_partial) (implicitly performing the combine function on the aggregate, for re-aggregation purposes)

Proposed functions to act on these aggregates:

I think we can do all the ones here: https://www.postgresql.org/docs/12/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE (that take Y, X inputs that is) from that intermediate form.

A question is whether we want to just include these by default in our normal aggs, it does take extra space (6 DP numbers), but it makes the resulting intermediate form significantly more useful. We could also just define the regr forms to have all the functionality of the other form with additional functionality...

Counter partials, and interaction with PG Aggregates

There are several items that need to be addressed as we define the aggregates for counters here as well:

Types of extrapolation

What types of extrapolation should we have? We'll probably have to have one for Prom specifically that maintains its (somewhat odd IMO) extrapolation behavior: https://github.com/prometheus/prometheus/blob/master/promql/functions.go#L59 . What else shoudl we have? Linear regression? (with counter offset?) If so that affects whether we want to include the regression forms in the standard internal representation. N.B. The reasoning behind their somewhat odd behavior is discussed in detail in this thread: https://github.com/prometheus/prometheus/pull/1161 we should understand that and some of the other choices available when deciding on other methods we might want to implement.

Resulting types

I think we'll want to store the finalized result of the aggs as a defined type in PG, the question here is how we store/treat the different data types that can live in both the time column and the value column. Are we going to have a combinatorial explosion? Do we treat counter/gauge types differently (obv, gauges have reset vals that are always zero, counters don't, but we don't need separate types? Does this have an implication for dealing with the extrapolation for the Prom reproduction? (or do we just do that with a flag in the extrapolation type so that we know we're dealing with a counter) etc.

Other Prom functionality

Alternatives

We could define the functions directly on the value and time, so something like counter_rate(foo, time) however, I think this gets more confusing, especially if you get to a situation where you have a saturating counter and an extrapolated rate, both of which take parameters, it becomes less clear which parameters are referring to the rate step and which to the counter and I think it's a bit less elegant generally (function names get weird, it seems like things are different between counters and gauges, which Prom does, but I'm still not quite sure why/whether it's really necessary, we can still have reasonable defaults and explain what things do, but it seems a bit odd).

JLockerman commented 3 years ago

(edited for code formatting and to add syntax highlighting)

JLockerman commented 3 years ago

A preliminary version of the counter aggregate was released in 0.2.0 and should be available on Forge at the early next week.

epgts commented 2 years ago

Split remaining work into #391, #392, and #393.