Node-Exporter : memory usage too high (OOME)

SpeedBlack commented 6 years ago

Host operating system: output of `uname -a`

3.10.0-862.3.2.el7.x86_64

node_exporter version: output of `node_exporter --version`

sh-4.2$ node_exporter --version
node_exporter, version 0.16.0 (branch: HEAD, revision: d42bd70f4363dced6b77d8fc311ea57b63387e4f)
build user: root@node-exporter-binary-3-build
build date: 20180606-16:48:15
go version: go1.10

node_exporter command line flags

--path.procfs=/host/proc --path.sysfs=/host/sys

Are you running node_exporter in Docker?

Yes, in Openshift

Hi,

I use node-exporter (openshift/prometheus-node-exporter:v0.16.0) in Openshift with Prometheus and Grafana. I have a problem with memory recycling. The Pod is killed each time after at an OOME (OutOfMemory). Memory usage increases continuously without being recycled.

By default, the limits were (template here):

          resources:
            requests:
              memory: 30Mi
              cpu: 100m
            limits:
              memory: 50Mi
              cpu: 200m

I tested several configurations without success. Today, the configuration is:

          resources:
            limits:
              cpu: 250m
              memory: 250Mi
            requests:
              cpu: 100m
              memory: 75Mi

Do you have any idea ? An adjustment to make ? Do you have recommended resources limits ?

Thanks in advance for your help !

SuperQ commented 6 years ago

The easiest thing to do is get a pprof memory profile.

See https://github.com/prometheus/node_exporter/issues/859 for some more tips.

It would be useful to look at the the node_exporter's metric, go_goroutines over time to see if there is a leak.

SpeedBlack commented 6 years ago

Hi SuperQ,

Below, the memory usage of the "node-exporter" pod deployed on a node :

below the graph with the metrics go_goroutines for the same node : go_goroutines{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",instance="xxxxxx",job="kubernetes-nodes-exporter",kubernetes_io_hostname="xxxxxx",.....}

What is this metric ?

I'm looking at how to get the pprof.

Thanks, Best regards.

SuperQ commented 6 years ago

Yes, that looks like a goroutine leak. Can you post any pod logs to see if there are any errors?

SpeedBlack commented 6 years ago

Below the pod logs :

$ oc logs -f prometheus-node-exporter-krvrb
time="2018-07-19T03:57:03Z" level=info msg="Starting node_exporter (version=0.16.0, branch=HEAD, revision=d42bd70f4363dced6b77d8fc311ea57b63387e4f)" source="node_exporter.go:82"
time="2018-07-19T03:57:03Z" level=info msg="Build context (go=go1.10, user=root@node-exporter-binary-3-build, date=20180606-16:48:15)" source="node_exporter.go:83"
time="2018-07-19T03:57:03Z" level=info msg="Enabled collectors:" source="node_exporter.go:90"
time="2018-07-19T03:57:03Z" level=info msg=" - arp" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - bcache" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - bonding" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - conntrack" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - cpu" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - diskstats" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - edac" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - entropy" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - filefd" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - filesystem" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - hwmon" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - infiniband" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - ipvs" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - loadavg" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - mdadm" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - meminfo" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - netdev" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - netstat" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - nfs" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - nfsd" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - sockstat" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - stat" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - textfile" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - time" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - timex" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - uname" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - vmstat" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - wifi" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - xfs" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg=" - zfs" source="node_exporter.go:97"
time="2018-07-19T03:57:03Z" level=info msg="Listening on :9100" source="node_exporter.go:111"

Thanks

SuperQ commented 6 years ago

Thanks, hopefully the pprof dump will give us the information we need.

SpeedBlack commented 6 years ago

Hi Superq,

I could run the command but Graphviz is not installed and I can't install it in the pod:

go tool pprof -svg http://localhost:9090/debug/pprof/heap > heap.svg                                                                                                                      
Fetching profile over HTTP from http://localhost:9090/debug/pprof/heap                                                                                                                            
Could not use temp dir /pprof: mkdir /pprof: permission denied                                                                                                                                    
Saved profile in /tmp/pprof.prometheus.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz                                                                                              
Failed to execute dot. Is Graphviz installed? Error: exec: "dot": executable file not found in $PATH

I have to run the command in the pod because I have an OCP auth-proxy for prometheus.

But I have the raw data ("pprof.prometheus.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz" file). Do you know how I can generate the graph afterwards with Graphviz on another system ? I haven't found the command yet... or the tool.

Thanks, Best regards.

SpeedBlack commented 6 years ago

Sorry, it's Ok :

heap

it's Ok ?

Thanks, Best regards

SuperQ commented 6 years ago

That is a profile for Prometheus, not the node exporter. You need to use port 9100.

SpeedBlack commented 6 years ago

Hoo sorry, Please, find attachment pprof svg for node-exporter.

pprof_node_exporter.zip

Thanks Best regards

SpeedBlack commented 6 years ago

Below, the memory usage of the node-exporter since the restart of the pod:

SuperQ commented 6 years ago

It would help to take a pprof sample when it's using over 200MB of memory, this way we can more easily find the memory leak.

SpeedBlack commented 6 years ago

I took a pprof sample at 12:24:00. The node-exporter used 248MiB :

is it better to lower the MEMORY limit and take a pprof sample ? (Currently 250Mi | By default 50Mi)

Thanks, Best regards.

SuperQ commented 6 years ago

Strange. The pprof only shows 9.6MiB fo memory in use.

I would probably eliminate the CPU limit. I have seen CPU limits in K8s cause bad performance stalling. But I haven't looked at the current implementation to see if that's improved or not.

/cc @juliusv Do you know if there is something specific we can do to trace this goroutine leak?

SpeedBlack commented 6 years ago

Hi, I tried to remove CPU limits without result. I'll take another pprof sample.

Thanks, Best regards

SuperQ commented 6 years ago

One additional debugging option, get a dump of the goroutines with http://localhost:9100/debug/pprof/goroutine?debug=1. Normally it would be only a few, like this.

juliusv commented 6 years ago

@SuperQ Yeah, I would recommend listing the active goroutines, as we already know it's connected to a goroutine leak. But prefer debug=2 over debug=1, so: http://localhost:9100/debug/pprof/goroutine?debug=2

SpeedBlack commented 6 years ago

Hi,

I can't execute the command with debug=2 option :

# go tool pprof -svg http://xxxxx:9100/debug/pprof/goroutine?debug=2 > dump_goroutine_2.svg
Fetching profile over HTTP from http://xxxxx:9100/debug/pprof/goroutine?debug=2
http://xxxxx:9100/debug/pprof/goroutine?debug=2: parsing profile: unrecognized profile format
failed to fetch any profiles

Do you have any idea?

For the moment, please, find attachment dump of the goroutines (debug=1 option) and new pprof sample heap for node-exporter. dumps.zip

For information, currently, the pod node-exporter uses 137MiB

Thanks, Best regards.

SuperQ commented 6 years ago

The goroutine link doesn't require pprof, just curl. It's just a big text dump, not svg.

SpeedBlack commented 6 years ago

Ok, thanks for the clarification.

Please, find attachment dump of the goroutines (debug=1 & debug=2). goroutine_dump.zip

Thanks, Best regards.

SuperQ commented 6 years ago

Looks like it's getting stuck on netlink. This is used by the wifi collector, try running the node_exporter with --no-collector.wifi

/cc @mdlayher 😁

goroutine 17790 [chan receive, 501 minutes, locked to thread]:
github.com/prometheus/node_exporter/vendor/github.com/mdlayher/netlink.newSysSocket.func1(0xc420570e01, 0xc4206bc2f0, 0xc420570e40)
        /go/src/github.com/prometheus/node_exporter/vendor/github.com/mdlayher/netlink/conn_linux.go:258 +0x8a
created by github.com/prometheus/node_exporter/vendor/github.com/mdlayher/netlink.newSysSocket
        /go/src/github.com/prometheus/node_exporter/vendor/github.com/mdlayher/netlink/conn_linux.go:241 +0xa0

simonpasquier commented 6 years ago

@SpeedBlack also as you're using OpenShift, please note that the wifi collector should now be disabled by default in Ansible-based installations.

mdlayher commented 6 years ago

Interesting. It appears to be stuck in the locked syscall goroutine I have set up. I probably don't have time to look at this immediately, but can OP please clarify if they have WiFi devices in this system?

SpeedBlack commented 6 years ago

Great ! Thanks for your help ! I just added "--no-collector.wifi". I'll confirm tomorrow if the problem is gone !

@mdlayher, no I don't have wifi devices in this system.

Thanks, Best regards

SpeedBlack commented 6 years ago

Hi,

Today, I confirm it's Ok ! So, I close this case !

Thanks for your help ! Best regards.

mdlayher commented 6 years ago

I'm leaving this open for tracking.

SuperQ commented 6 years ago

@mdlayher Maybe we should file a new issue with a cleaned up summary.

SuperQ commented 4 years ago

@mdlayher Did we ever get the upstream issue solved for this?

mdlayher commented 4 years ago

I don't believe so. I haven't touched that code in quite a while and wasn't able to easily reproduce these results.

alter commented 3 years ago

The same, hello from 2021

ovidiubuligan commented 3 years ago

Sorry, it's Ok :

it's Ok ?

Thanks, Best regards

Hello , you have an interesting chart there . How did you generate that map ?

tdudgeon commented 2 years ago

Anything new on this? First reported in 2018! It does still seem to be a real problem, and for me the more memory the node has the more often the restarts happen (less than every hour for node with 512GB RAM).

discordianfish commented 2 years ago

@tdudgeon Do you have the wifi collector enabled? Can you provide the pprof details as described above?

tdudgeon commented 2 years ago

@discordianfish No, looks like it wasn't disabled. I've done so now by adding --no-collector.wifi to the spec.containers.args for the DaemonSet and so far I've seen no restarts. So maybe that's fixed it. For Prometheus installed using the Prometheus Operator is editing the DaemonSet the right way to do this?

tdudgeon commented 2 years ago

@discordianfish So after monitoring for a while there are still pod restarts due to OOM, and memory usage does continually increase until the pod is restarted, but the restarts are less frequent than they were before adding --no-collector.wifi to the node exporter pod arguments. Again this affects nodes with large amounts of RAM much more severely than ones with smaller amounts. I'll try to work out how to generate the pprof details.

tdudgeon commented 2 years ago

@discordianfish I'm struggling to work out how to generate the pprof details. Looks like I need to run go tool pprof -svg http://localhost:9100/debug/pprof/heap > heap.svg but is looks like go isn't present in the node exporter pod. What's the way to do this?

discordianfish commented 2 years ago

@tdudgeon You can just run go tool pprof on your dev laptop as long as it can reach the node-exporter port. You could also use kubectl port-forward to forward the port

tdudgeon commented 2 years ago

@discordianfish Here's the data. Hope I did it right!

heap

pprof.node_exporter.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz

discordianfish commented 2 years ago

This shows that node-exporter is using 3515.72kB memory in your case. So I don't know why and what restarts your node-exporter but memory usage doesn't seem to be a problem. At least not when you created the profile.

tdudgeon commented 2 years ago

@discordianfish Here are dumps from just before the pod was restarted. This is memory usage profile I see. The memory request is 30Mi and the limit is 50Mi.

heap

pprof.node_exporter.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz

discordianfish commented 2 years ago

This is again only using 2064.26kB in the go process. There must be something else in the container using that memory. But you're sure it's actually getting OOM killed and it's not just the go GC or linux fs cache leading to this memory usage pattern?

tdudgeon commented 2 years ago

Yes, definitely being killed:

  containerStatuses:
    - name: node-exporter
      state:
        running:
          startedAt: '2022-04-29T13:14:57Z'
      lastState:
        terminated:
          exitCode: 137
          reason: OOMKilled
          startedAt: '2022-04-29T12:28:57Z'
          finishedAt: '2022-04-29T13:14:56Z'
          containerID: >-
            docker://eba0813c8ac002f553e0234bad1cc819714767a29e58ccd74debede6bca28c2b
      ready: true
      restartCount: 537
      image: rancher/mirrored-prometheus-node-exporter:v1.1.2
      imageID: >-
        docker-pullable://rancher/mirrored-prometheus-node-exporter@sha256:b71ec5c0a45792183d5506ac86d1cab286eed997152ca35602e58d5e75e8eba8
      containerID: >-
        docker://f2e0d2711b04d48c4dbb08412c3df87f229d171b7e74a8beabea15388889e418
      started: true
  qosClass: Burstable

What is strange is that the more RAM is on the node the more a problem this is.

discordianfish commented 2 years ago

So either there is something else in the container causing this or it's the node-exporter itself, which seems unlikely but of course is possible. But if the later is the case, it should show up in pprof and when you captured it it wasn't present. So I'd suggest keep trying to pprof it to catch the condition or look what else in the container might cause it. I don't know what rancher/mirrored-prometheus-node-exporter:v1.1.2 is doing. You could also just try our official images and see if it happens there as well.

thenamehasbeentake commented 1 year ago

Amazing that node-exporter can report itself when OOM.

SuperQ commented 1 year ago

There have been a number of releases since this issue was reported. The only root cause, the wifi collector, has been disabled by default for quite a while. I think we can close this.

prometheus / node_exporter