Closed smalenfant closed 3 years ago
Same here
👍
You want derivative(last("read_bytes"))
(last
not sum
). sum
is going to add all the results in a time-group together. If you have one time-group that looks like [1,2] and the next one is [4,6,9], summation is going to give you values of 3 & 19, and it'll look like you have a read_bytes
difference of 16. When really the bytes read between the two time-groups is 7 (which is 9-2).
I have the same problem with net
plugin - all the fields over there are accumulative and I'm using InfluxDB 0.9.
I can submit a PR if you guys want.
I would suggest generalizing the title of this PR, as dealing with counters is a common issue, not specific to diskio.
Currently the InfluxDB (1.1) is still unable to deal with counter wraparounds - the _non_negativederivative transformation looks like un ugly hack to deal with this. It is also unable to provide something like 'a 5 minute maximum of a one-second counter rate'.
Delegating deriving of a counter rate to telegraf (e.g. as its aggregator plugin) would solve both InfluxDB issues. It also seems that telegraf is in a better position to recognize and handle counter wraparounds and provide a proper derivative.
It is also unable to provide something like 'a 5 minute maximum of a one-second counter rate'.
Subqueries should be able to handle this: https://github.com/influxdata/influxdb/pull/7646
It also seems that telegraf is in a better position to recognize and handle counter wraparounds and provide a proper derivative.
How do you figure? Telegraf is just reading a number. It doesn't know when the value is going to reset. Whatever logic telegraf uses influxdb can use to.
I would also argue that this problem is not specific to telegraf. Thus addressing it in influxdb solves it everywhere.
is there a way to convert cumulative/total metrics to 'rate' thru aggregators in telegraf ? I am looking at net metrics like bytes_recv, bytes_sent and charting them looks like a mountain slope :) I am not using influxDB output so cannot use derivative.
@vishiy This pull request would implement it #4435
Depending on whatever format you're outputting the data in, you may also use kapacitor. It can do all sorts of data processing, and then forward it on. But I don't think it supports all the destinations & formats telegraf does.
@vishiy This pull request would implement it #4435
@danielnelson - thank you. When will this be merged and available ? do you know ?
I'm not sure, I haven't had time to look at it closely yet.
I also agree @MarkMartinec that this could be generalized a common counter issue. It would solve a lot of problems if telegraf would be able to aggregate derivatives (rates or deltas) and write to output already calculated values. It similar issues it was answered by Influx developers that writing raw counter values to database gives more flexibility. At first glance- yes, actually- not. Because that flexibility brings a lot of limitations especially in Grafana and Elasticsearch (in my case). For example Elasticsearch can not sort on derivative aggregation. Also Grafana loading time is impacted because in the background it has to load 3-4 times more data when just loading avg.
Is there a plan for an "processing" plugin that aggregate derivatives (rates or deltas)? Would you accept a PR for such a processor?
I developed several monitoring agents in nodejs. I used deltas for all growing metrics, but could not find anything in telegraf processor plugins.
Example in my nginx agent: https://github.com/sematext/sematext-agent-nginx/blob/master/lib/aggregator.js#L31-L37
Always growing values are really a bit of problem, depending your data store and query/visualisation options. Please don't assume everybody is using InfluxDb.
I would really like to see a processor for aggregate derivatives (rates or deltas), and i would also contribute code, unfortunately my GO skills are "read-only".
Yes, there is a pull request open to do deltas https://github.com/influxdata/telegraf/pull/4435, and I am planning to take a look at it as soon as I can.
Please don't assume everybody is using InfluxDb.
Can I ask what store/vis you are using that doesn't support counters?
I use Sematext Cloud - currently, charts would render counters with the always growing absolute value. Therefore Sematext agents calculate delta or rates. This might change one day, nevertheless, I think the possibility to calculate deltas and rates is really a handy feature for any monitoring agent. On the client-side, it needs fewer calculations at query-time or render-time and users don't even need to think of the problem.
I have to aggregate reads/write over 24 disk devices to see throughput. The problem is that InfluxDB doesn't behave well with derivative and CQ with derivative don't work at all (up to 0.11). Having the rates per seconds calculated by Telegraf directly would be great. Mostly for
read_bytes
andwrite_bytes
but not limited to this. Same would be applicable to thenet
module. Overall, even if derivative would work, this would make it much simpler to sum up with rates since this would require only a single Query.Example of query used that shows a spike at the beginning and end :
SELECT derivative(sum("read_bytes"), 10s) AS "Read" FROM "diskio" WHERE "hostname" =~ /$Hostname$/ AND $timeFilter GROUP BY time(1m) fill(null)