node_cpu_seconds_total values are not monotonically increasing

venkatbvc commented 4 years ago

Host operating system: output of `uname -a`

Linux ddebvnf-oame-1 3.10.0-1062.7.1.el7.x86_64 #1 SMP Wed Nov 13 08:44:42 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

node_exporter, version 0.17.0 (branch: HEAD, revision: f6f6194a436b9a63d0439abc585c76b19a206b21) build user: root@322511e06ced build date: 20181130-15:51:33 go version: go1.11.2

node_exporter command line flags

Are you running node_exporter in Docker?

NO

What did you do that produced an error?

nothing. Node exporter is running and prometheus is scrapping the metrics. scrape interval is 5s. when a graph is plotted for node_cpu_seconds_total, we saw a huge spike. Following is the query used: rate(node_cpu_seconds_total{cpu="6",instance="osc1deacsdme1-oame-0",job="System",mode="iowait"}[2m])

What did you expect to see?

There should not be any huge spikes. and we saw a dip in node_cpu_seconds_total values.

What did you see instead?

There is a huge spike on 9th of March at 00:26:30 . as there is dip in node_cpu_seconds_total values.

following is the data in prometheus: curl -g 'http://localhost:9090/api/v1/query?query=node_cpu_seconds_total{cpu="6",instance="osc1deacsdme1-oame-0",job="System",mode="iowait"}[2m]&time=1583693790' {"status":"success","data":{"resultType":"matrix","result":[{"metric":{"name":"node_cpu_seconds_total","cpu":"6","instance":"osc1deacsdme1-oame-0","job":"System","mode":"iowait"}, "values":[[1583693670.227,"62176.51"],[1583693675.227,"62176.77"],[1583693680.227,"62176.98"],[1583693685.227,"62176.99"],[1583693690.227,"62176.99"],[1583693695.227,"62177.03"], [1583693700.227,"62177.08"],[1583693705.228,"62177.08"],[1583693710.227,"62177.09"],[1583693715.227,"62177.09"],[1583693720.227,"62177.09"],[1583693725.227,"62177.09"], [1583693730.227,"62177.09"],[1583693735.227,"62177.09"],[1583693740.227,"62177.09"],[1583693745.227,"62177.09"],[1583693750.227,"62177.09"],[1583693755.227,"62177.09"], [1583693760.227,"62177.09"],[1583693765.227,"62177.09"],[1583693770.227,"62177.09"],[1583693775.227,"62177.24"],[1583693780.227,"62177.2"],[1583693785.227,"62177.2"]]}]}}

would like to know why there is a dip in counter value.

discordianfish commented 4 years ago

Interesting, but I would assume some kernel issue? We just return this from procfs.

SuperQ commented 4 years ago

This is a know issue with iowait in the Linux kernel. We noticed this at SoundCloud years ago, but never got anywhere digging into it. Recently, I was looking into it again. We found a some interesting info. It seems specifically broken in iowait due to the way the data collection is implemented in the kernel.

What we ended up doing to work around this was break out iowait into a deriv() rule, separate from the rest of the CPU merics. I was considering updating the example recording rules file to document this.

I've thought about bugging kernel people, but I'm not sure there would be any interest in fixing this, especially since it means having a lock, which is something kernel devs are very cautious about.

venkatbvc commented 4 years ago

@discordianfish @SuperQ Thanks for your response. Should we use deriv function instead of rate, so that these spikes are not seen in the graphs?

discordianfish commented 4 years ago

Hrm.. I mean.. it's kinda our problem now. We shouldn't expose it as counter if it's not really a counter after all.

Fixing this upstream would be great.. Or we could add a workaround that tracks the max value and print and error and returns that max if the current value is lower.

MaheshGPai commented 4 years ago

As per prometheus doc, it should be used only for gauges

deriv() deriv(v range-vector) calculates the per-second derivative of the time series in a range vector v, using simple linear regression. deriv should only be used with gauges.

Since the metric is curently exposed as counter, I'm not sure how the prometheus query engine will process. If there is no issue with using deriv() instead of irate()/rate(), then it should be fine. Else, instead changing the query to return only results <=100 should elimiate the spikes seen in grafana

sum by (instance)(irate(node_cpu_seconds_total{mode="iowait"}[5m])) * 100 <=100

SuperQ commented 4 years ago

@discordianfish Yea, the only thing we can do is keep track of the data coming from /proc/stat and only output data if it goes up. We could log debug if there's a drop in values.

The question of what to do if the list of CPUs changes.

SuperQ commented 4 years ago

I've seen this enough times that I think we should do the workaround for the bad kernel data. It's not typically best practice for an exporter to do this kind of stuff, but I think we need to in this case.

brian-brazil commented 4 years ago

The question of what to do if the list of CPUs changes.

This has been stuck in my head, so I did some research. If I hotplug offline a CPU the relevant cpu disappears but the other cpu names in /proc/stat don't change - however when I online the CPU again at least the idle and iowait counters get reset:

Before:
cpu1 157846114 580231 38791682 1157995658 2587676 0 151288 0 0 0
After:
cpu1 157847655 580231 38792001 105 0 0 151288 0 0 0

So the problem isn't if the list of CPU changes, it's if there's an actual counter reset.

This was on 4.15.0-66-generic.

SuperQ commented 4 years ago

@brian-brazil Thanks. So I guess what we need is to track the list of CPUs, if the list changes, we invalidate the tracking cache.

brian-brazil commented 4 years ago

As long as there's a scrape while the CPU is offlined.

If it's only iowait and not idle that's buggy, another approach would be to check for both going down - plus they can't increase by more than a second per second anyway and I'd hope noone is toggling CPUs every scrape interval.

prometheus / node_exporter