microsoft / Kusto-Query-Language

Kusto Query Language is a simple and productive language for querying Big Data.
Apache License 2.0
510 stars 97 forks source link

Allow make-series to fill empty values with the previous known value #51

Closed fgheysels closed 1 year ago

fgheysels commented 3 years ago

In KQL, there are multiple aggregation methods available like series_fill_forward, series_fill_backward, etc... These functions work on a series, so in most of the cases, you need to define the series first via make-series. make-series allows you to fill in the gaps via the default argument. As far as I know, the value that is passed as 'default' must be a const.

It would be nice if we could combine the 'interpolation' with the make-series function, so that it is allowed to do something like this:

make-series myseries = max(value) default=fill_previous on timestamp .... 
royoMS commented 3 years ago

Thanks for the suggestion - it was added as a feature request to our backlog. It is marked as low priority because it is an ease of use request (it can be achieved by using a make-series followed by a series functions)

fgheysels commented 3 years ago

Thanks.

Although, it is not very clear to me how I can easily retrieve a series of metrics, for which I sample down the values that are returned, to give me for instance the last known value per 15 minute interval, and if I don't have any sample for a specific 15-minute interval, the value of the previous interval can be used.

I came up with something like this:

Table
| where timestamp  > ago(5d) 
| where tag == "sometag" 
| summarize arg_max(timestamp, value_decimal) by bin(timestamp, 15m) 
| make-series sometag_series = max(value_decimal) default=long(null) on timestamp1 step 15m 
| project spec = series_fill_forward(sometag_series), timestamp1 
| render linechart with ( ycolumns=spec,  xcolumn=timestamp1) 

As you can see, I first use the arg_max function to get the LKV per interval of 15 minutes, but afterwards, in the make-series statement, I need to provide another aggregation method. I think I can use min, max, or even any here since I should normally have only one value per interval, but it doesn't make the statement very readable imho. (Unless there are other ways to do this ? )

royoMS commented 3 years ago

I think the summarize is redundant: make-series will take the maximal value in each 15m bin (step) with default null, the consequent series_fill_forward will fill the empty bins with the last known value. I think the following query (similar to your query, omitting the arg_max summarize) is quite readable:

Table
| where timestamp  > ago(5d) 
| where tag == "sometag" 
| make-series sometag_series = max(value) default=double(null) on timestamp step 15m 
| project spec = series_fill_forward(sometag_series), timestamp 
| render linechart with ( ycolumns=spec,  xcolumn=timestamp)
fgheysels commented 3 years ago

I don't think it is similar, as I use arg_max on timestamp in the summarize statement to get the last known value per interval. If you omit this, you get -afaik- the maximum value per interval, but the maximum value is not necesserily the last known value.

royoMS commented 3 years ago

Right - I missed that you want last in each bin and not max. Then the ask is for make-series to support arg_max() aggregation? (currently arg_max is not supported in make-series) so you could do something like the following:

Table
| where timestamp  > ago(5d) 
| where tag == "sometag" 
| make-series (ts,sometag_series) = arg_max(timestamp, value) default=double(null) on timestamp step 15m 
| project spec = series_fill_forward(sometag_series), timestamp 
| render linechart with ( ycolumns=spec,  xcolumn=timestamp)
fgheysels commented 3 years ago

Ideally, this should be possible:

make-series (ts, sometag_value) = arg_max(timestamp, value) default = fill_previous on .... 

arg_max in make-series and fill_previous functionality for the default keyword.

royoMS commented 3 years ago

Ack. Both have been added as feature requests to our backlog - they are both ease of use features (tagged with low priority). Thanks for the feedback

mattwar commented 1 year ago

I'm closing this as it is not an open issue specifically for this library.

fgheysels commented 1 year ago

Why not ? This repository is called 'Kusto-Query-Language'. this is a feature request forthe Kusto query language.

sloutsky commented 1 year ago

This github repository is for a parser. The change you've requested requires change of the Kusto service - therefore it's not the right place to track it. Please use https://aka.ms/adx.uservoice to suggest and upvote the idea.