prometheus / node_exporter

Exporter for machine metrics
https://prometheus.io/
Apache License 2.0
10.89k stars 2.33k forks source link

`NF_CONNTRACK_PROCFS` is obsolete and disabled by default in recent kernels #2491

Open Ma27 opened 1 year ago

Ma27 commented 1 year ago

Host operating system: output of uname -a

Linux carsten 5.15.70 #1-NixOS SMP Fri Sep 23 12:15:52 UTC 2022 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.3.1 (branch: unknown, revision: v1.3.1)
  build user:       nix@nixpkgs
  build date:       unknown
  go version:       go1.17.13
  platform:         linux/amd64

(Though the problem also appears to exist on master).

node_exporter command line flags

ExecStart=/nix/store/xvib5wqz29v3m3ln0266w7w3ncdkfwgg-node_exporter-1.3.1/bin/node_exporter \
  --collector.textfile \
   \
  --web.listen-address 0.0.0.0:9100 --collector.textfile.directory=/run/prometheus-node-exporter-textfiles

Are you running node_exporter in Docker?

no

What did you do that produced an error?

I updated my host's Linux kernel to a version >5.15.65. In this version, the default value of NF_CONNTRACK_PROCFS was changed to n. This means that /proc/net/stat/nf_conntrack is not available anymore and thus node_scrape_collector_success{collector="conntrack"} is 0 (and all metrics except for nf_conntrack_entries and nf_conntrack_entries_limit which are collected via /sys/net/netfilter/nf_conntrack_{count,max} are not available anymore.

The commit message suggests to use conntrack(8) instead of /proc/net/stat/nf_conntrack because the latter one was marked as obsolete.

What did you expect to see?

I'd expect to see all metrics of the conntrack collector being exposed.

What did you see instead?

Only _limit and _entries were exposed instead.

discordianfish commented 1 year ago

Ugh... guess we need to move this to netlink (or whatever conntrack(8) is doing).

noroutine commented 1 year ago

Hit this on Ubuntu 22.04

:~# uname -a
Linux 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

conntrack collector silently doesn't work without even writing a message into logs, took a while to track this down

If anyone stumbles on this, i wrote a small shim to provide (almost) all metrics with same aggregation semantics as conntrack collector

https://github.com/noroutine/node-exporter-conntrack-shim

g00g1 commented 1 year ago

The problem with the new netlink interface is a requirement of root privileges or CAP_NET_ADMIN. I don't think that any of those is a good option to be embedded into node_exporter's code base. Maybe we should just move to textfile collector? Otherwise additional dependency for libmnl (and its wrapper) will be added, as well as new permissions.

discordianfish commented 1 year ago

Since conntrack stats are IMO so fundamental, I think we should finally list our 'unprivileged' requirements.. @SuperQ wdyt?

dswarbrick commented 1 year ago

Otherwise additional dependency for libmnl (and its wrapper) will be added, as well as new permissions.

I'm not sure why libmnl would be needed. There is a native Go package for interacting with conntrack via netlink, which exposes all the stats that we currently read from /proc/net/stat/nf_conntrack: https://pkg.go.dev/github.com/florianl/go-conntrack#CPUStat.

However this, as already acknowledged, will require running with CAP_NET_ADMIN (or root). Sadly the permissions checking in the netfilter / conntrack stuff is not granular at all, and all netlink access is gated behind a CAP_NET_ADMIN check - even "read-only" requests.