postmates / cernan

telemetry aggregation and shipping, last up the ladder
Other
315 stars 11 forks source link

Filters have no introspection into telemetry aggregation methods #225

Open blt opened 7 years ago

blt commented 7 years ago

When issuing points to the Wavefront sink it'd be handy in some situations to send a zero point after a configurable timeout, per a recent conversation with @bitglue. Experimenting with a filter to prototype this idea I ran into a fun oversight in the filter API: there's no introspection or manipulation of telemetry aggregation methods. Consider this script:

name_set = {}
ticks = 0

function process_metric(pyld)
   local name = payload.metric_name(pyld, 1)
   print(name)
   name_set[name] = true
end

function process_log(pyld)
end

function tick(pyld)
   ticks = (ticks + 1) % 10
   if ticks == 0 then
      for k in pairs(name_set) do
         payload.push_metric(pyld, k, 0)
      end
   end
end

If a point comes in for a SET sparse time series then after the timeout of 10s a SUMMARY zero point will go out. What we need to do is extend the filter API so that it's possible to introspect on the aggregation method as well as set it.

bitglue commented 7 years ago

I think for this to work, the zero point needs to be sent at the same interval as a transition from zero to non-zero would be sent. Otherwise, Wavefront will interpolate the series, creating non-zero values where the value should have been zero.

For example, consider a series which we sample once per second, and this actually happens:

0 0 10 0 0 0 0 0

If we send to Wavefront (N means null, ie no value):

0 0 10 N N N N 0

Then Wavefront will interpolate that to mean:

0 0 10 8 6 4 2 0

Which is obviously not at all what happened.