timescale / timescaledb-toolkit

Extension for more hyperfunctions, fully compatible with TimescaleDB and PostgreSQL 📈
https://www.timescale.com
Other
382 stars 47 forks source link

[WIP] `Interval` utilities and ergonomics #457

Open JLockerman opened 2 years ago

JLockerman commented 2 years ago

This is a meta-issue tracking various ergonomic improvements for Intervals we may want to add. It comes from some work I was doing analyzing interval quantiles. Something like

SELECT
  justify_interval((approx_percentile(0.10, uddsketch)::TEXT || ' seconds')::interval) "0.10",
  justify_interval((approx_percentile(0.30, uddsketch)::TEXT || ' seconds')::interval) "0.30",
  justify_interval((approx_percentile(0.50, uddsketch)::TEXT || ' seconds')::interval) "0.50",
  justify_interval((approx_percentile(0.70, uddsketch)::TEXT || ' seconds')::interval) "0.70",
  justify_interval((approx_percentile(0.90, uddsketch)::TEXT || ' seconds')::interval) "0.90"
FROM(
  SELECT
    percentile_agg(EXTRACT(EPOCH FROM interval))
  FROM intervals
);

If all the changes proposed here would be implemented, that code would look more like one of

SELECT
  approx_percentile(0.10, uddsketch)::seconds -> justify() "0.10",
  approx_percentile(0.30, uddsketch)::seconds -> justify() "0.30",
  approx_percentile(0.50, uddsketch)::seconds -> justify() "0.50",
  approx_percentile(0.70, uddsketch)::seconds -> justify() "0.70",
  approx_percentile(0.90, uddsketch)::seconds -> justify() "0.90"
FROM(
  SELECT
    percentile_agg(EXTRACT(EPOCH FROM interval))
  FROM intervals
);
Other possibilities **Mega-Function** ```SQL SELECT justified_interval(seconds => approx_percentile(0.10, uddsketch)) "0.10", justified_interval(seconds => approx_percentile(0.30, uddsketch)) "0.30", justified_interval(seconds => approx_percentile(0.50, uddsketch)) "0.50", justified_interval(seconds => approx_percentile(0.70, uddsketch)) "0.70", justified_interval(seconds => approx_percentile(0.90, uddsketch)) "0.90" FROM( SELECT percentile_agg(EXTRACT(EPOCH FROM interval)) FROM intervals ); ``` **Multiple Functions** ```SQL SELECT seconds(approx_percentile(0.10, uddsketch)) -> justify() "0.10", seconds(approx_percentile(0.30, uddsketch)) -> justify() "0.30", seconds(approx_percentile(0.50, uddsketch)) -> justify() "0.50", seconds(approx_percentile(0.70, uddsketch)) -> justify() "0.70", seconds(approx_percentile(0.90, uddsketch)) -> justify() "0.90" FROM( SELECT percentile_agg(EXTRACT(EPOCH FROM interval)) FROM intervals ); ```
Full Postfix It's an interesting thought experiment to see what this would look like if we used accessor notation for all of the operations. ```SQL SELECT approx_percentile(0.10, uddsketch) -> as_seconds() -> justify() "0.10", approx_percentile(0.30, uddsketch) -> as_seconds() -> justify() "0.30", approx_percentile(0.50, uddsketch) -> as_seconds() -> justify() "0.50", approx_percentile(0.70, uddsketch) -> as_seconds() -> justify() "0.70", approx_percentile(0.90, uddsketch) -> as_seconds() -> justify() "0.90" FROM( SELECT percentile_agg(EXTRACT(EPOCH FROM interval)) FROM intervals ); ``` Since I'm not comfortable implementing `->` on the builtin types–due to the risk of collisions and the complexities of doing type conversion in pipelines–I don't consider consider this a serious proposal yet. Though if those two issues could be solved it would be promising.

Conversion from numbers

Problem Statement

Intervals are unusual types in that they're almost numbers, but due the fact that they can be specified in variable-length units, such as months or years, they cannot quite be treated like numbers. For certain operations(such as getting interval quantiles in approximate_percentile() the best way to handle this is to convert to some well-defined unit, such as seconds, perform the math on that, then convert back to an Interval for display purposes.

Currently the canonical way to convert a number of seconds (stored in $number) to an interval is:

($number::TEXT || ' seconds')::interval)

which is both confusing to read and unintuitive to write. A direct conversion function such as would be much easier to understand and search for. There are a number of possible ways to implement such a conversion depending on which use case we want to target. The next section expands upon my preferred one in detail, and the rest are briefly summarize other in Alternatives

Type-Based Conversion

This strategy works by creating a pseudo-type for each unit we want to convert from. These types aren't meant to be useful directly, they are actually Intervals internally and can be automatically casted to Interval. They exist to provide explicit casts from the numeric types and as something for accessors to be defined on. Our example earlier of converting a number of seconds to an Interval would look like

$number::seconds

the main advantages to this over other methods is that

  1. The unit is postfix in the usual place we expect to see units of measure.
  2. It provides us with a type we can hang accessors off of such as $number::seconds -> justify() described below.

Creating an Interval from multiple units looks something like

$num1::days + $num2::hours + $num3::minutes

Which could be seen as nice, or unpleasant depending on taste.

The main disadvantages is that it's potentially abusing casts in a way they're not designed for. Also if Postgres does finally deprecate the :: syntax this would being as nice.

Though these don't seem like deal breakers

More Details For this method we'd create a slew of types, one for each of the units we want to convert to. This would most likely include `years`, `months`, `hours`, `minutes`, `seconds`, `milliseconds`, `microseconds`. Each would have an (assignment?) cast from the integer types, and possibly the float and numeric types (though how fractions are handled would need to be determined). These casts can be implemented using the current `TEXT` conversion path for a PoC. Each type would also have an implicit cast to `Interval`

Alternatives

There are two major alternatives to type-based conversion a mega-function would define a single function such as

interval_of(years=>$y, months=>$m, hours=>$h, minutes=>$m, seconds=>$s, milliseconds=>$ms, microseconds=>$us)

which can convert from any number of numbers to an Interval. In practice, few users would use all the arguments, and usage would likely look something like

interval_of(seconds=>$num)

The alternative would be to use multiple functions, one for each unit, such as

years($num)
months($num)
seconds($num)

Multiple functions would also allow us to create types per-until if we want them.

Justification and other transformations

TODO

davidkohn88 commented 2 years ago

There are already some functions that do this, but for me, I think I'd prefer something simpler as a starting point that didn't have so many damn options namely, interval_to_seconds and seconds_to_interval, partially because I think the EXTRACT(EPOCH FROM interval) syntax is kinda weird, and because I think having symmetric functions like this can be good. I'd also automatically run justify on that.

While the other stuff is cool, I think I'd probably go quick and dirty for now, this is the main thing people need, I wouldn't get overly fancy.

I'd also suggest that we should think about making two other functions: 1) round(interval, precision (seconds, hours, etc)) 2) interval_bucket() like time_bucket but for intervals, this would probably also obviate the need for the rounding function and be more versatile.