prometheus / node_exporter

Exporter for machine metrics
https://prometheus.io/
Apache License 2.0
11.3k stars 2.38k forks source link

Add neighbor stats #535

Open Cellophan opened 7 years ago

Cellophan commented 7 years ago

Need: The node_exporter doesn't provide statistics about neighbors (the ip neighbor's way): numbers, table overload, garbade collections, limts,...

More info: The neighbors can be found via ip neigh show. The raw data is in /proc/net/stat/arp_cache and /proc/net/stat/ndisc_cache (based on that see "struct neigh_table").

Use case: I'm working with 5 hosts, more than 120 containers and I discovered in a kernel log file that I have an ARP table overload for the second time. The direct impact is a loss of packet and thus this is a root cause hard to find.

skottler commented 7 years ago

I've implemented https://github.com/prometheus/node_exporter/pull/540 which sounds like it should meet the requirements of this request. It collects entry counts on a per-device basis from /proc/net/arp.

Cellophan commented 7 years ago

@skottler Thanks!

During last week I create a shell script to put the data in the text collector:

#!/bin/bash

set -eu

: ${DIR:="/path/to/text-collector"}

LNSTAT="${DIR}/lnstat.prom"
lnstat -c 1 --json \
        | sed -e 's|{{|{|' -e 's|}}|}|' \
        | jq -r 'to_entries | .[] | "lnstat_\(.key)_\(.value | to_entries| .[] | "\(.key) \(.value)")"' \
        > ${LNSTAT}.$$
mv ${LNSTAT}.$$ ${LNSTAT}

SYS="${DIR}/sys.prom"
for f in /proc/sys/net/ipv4/neigh/default/*; do
        echo "sysctl_ipv4_neigh_default_$(basename $f) $(< $f)" \
                >>${SYS}.$$
done
mv ${SYS}.$$ ${SYS}

In this script I added some data from /proc/sys/net/ipv4/neigh/default because I realized that having the gc_thresh{1..3} is needed to understand the behavior of the metrics collected by /proc/net/stat/arp_cache.

So the question I have: is collecting the gc_* values part of this Issue or not.

Cellophan commented 7 years ago

After re-reading: you wrote /proc/net/arp. This file is more readable than /proc/net/stat/arp_cache but I don't know what this data is.

I'm dealing with messages from the kernel telling me that my ARP table has been full. I have increased /proc/sys/net/ipv4/neigh/default/gc_thresh3 and I don't have the problem anymore for now. I assume I solved it. But as gc_thresh3 was at 2048, I don't understand why /proc/net/arp has only some entries. At least lnstat -c 1 gives me a lot more.

It's like /proc/net/arp is for my default environment, and /proc/net/stat/arp_cache compiles all tre caches of all thje containers. Thus I think the second file is better.

mweinelt commented 7 years ago

IPv6 neighour tables and garbage collection thresholds should be considered as well.

Also the current arp collector does not expose the state (reachable|stale|failed|incomplete) as a label.

discordianfish commented 6 years ago

@skottler Since you added the arp collector, what do you think about adding the state as label?

In general, contributions are welcome to add these things.

skottler commented 6 years ago

@discordianfish I'm happy to add the label, it'd be a nice change the ARP collector. We'd discussed in a previous pull request about potentially putting the parsing code for /proc/net/arp into the procfs library and I submitted an initial version of that in https://github.com/prometheus/procfs/pull/105. Once that lands I will plan to switch over node_exporter to use it and then follow up by expanding it with better state support.

How does that sound?

discordianfish commented 6 years ago

@skottler That sounds perfect, thanks!

naeimehmhm commented 4 years ago

IPv6 neighour tables and garbage collection thresholds should be considered as well.

Also the current arp collector does not expose the state (reachable|stale|failed|incomplete) as a label.

Any update on adding these labels to arp collection? :)

eugercek commented 5 days ago

May I pick this up for adding state labels?

EDIT: Made a small PR. Maybe we can talk about NDP collector, I'm also wiling to implement it, if community wants to add ndp collector.