influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.32k stars 492 forks source link

[feature request] Support bulk operations on fields #1202

Open rossmcdonald opened 7 years ago

rossmcdonald commented 7 years ago

Add ability to support bulk operations on field values without actually specifying the field key. This will be very helpful when using Kapacitor as a downsampling engine (similar to the use-case discussed in the documentation here). From my perspective, supporting this effectively requires:

More concrete examples:

Numeric Data

Aggregating numeric data is fairly straightforward. For example, I would like to be able to apply mathematical operations to anonymous fields based on their type. This would be similar to running the query in InfluxQL:

SELECT sum(*), min(*), max(*), count(*) FROM /.*/ WHERE time > now() - 1m

Where InfluxQL automatically applies the operations to all numeric fields (even if string or boolean fields are present). In TICKscript, I could see this looking similar to:

var data = stream
|from().groupBy(*)
|window().every(1m).period(1m)

var count = data
// Rename 'count' result to be the same name as the original field key for all fields
|count(*)
.as($fieldkey)

var sum = data
|sum(*)

...

Where the InfluxQL functions accept a wildcard operator *. When the wildcard is used, type errors would be ignored (for example, attempting to use sum() on string types), or a property method would need to be set to enable ignoring type errors. Once a wildcard is referenced, you can then reference the field key with the $fieldkey operator.

Non-numeric Data

This enhancement could also leverage the type checks proposed in https://github.com/influxdata/kapacitor/issues/1201, which would allow the user to specify different actions to take based on the type of the field. One fairly common request I hear is how to effectively downsample string data. For example, based on the input data:

measurement,tagkey=tagvalue mystring="test1"
measurement,tagkey=tagvalue mystring="test2"
measurement,tagkey=tagvalue mystring="test2"
measurement,tagkey=tagvalue mystring="test1"
measurement,tagkey=tagvalue mystring="test1"

Taking the aggregate of this data will look like:

measurement,tagkey=tagvalue mystring_count=3i,mystring="test1"
measurement,tagkey=tagvalue mystring_count=2i,mystring="test2" 

Doing this in TICKscript currently requires that you know the field keys ahead of time (which is not always possible). Being able to generalize this algorithm into something like:

var data = stream
|from().groupBy(*)
|window().every(1m).period(1m)

data
|where(lambda: isinstance(*, string))
// convert string field to tag
|eval(lambda: "$fieldkey")
    .keep()
    .tags($fieldkey)
// group by new tag
|groupBy(*)
// count number of fields per tag
|count(*)
    .as($fieldkey + '_count')
// remove created tag
|delete().tag($fieldkey)

Would be amazingly helpful.

oplehto commented 7 years ago

This would be very useful for a use case that we are seeing: https://community.influxdata.com/t/kapacitor-template-task-vs-a-single-task-with-branches/1335/2

I can foresee more similar cases on the horizon as we expand our Kapacitor use.

davidgs commented 6 years ago

This would be super useful for the IoT Gateway device use case. A small gateway device doing local data collection that could simply have a kapacitor script to Downsample all data from a given measurement and forward that upstream would be brilliant. As it is, I'm writing a kapacitor script for each field to do this, so I'll have about 30 TICK scripts.