Open Cellophan opened 7 years ago
I've implemented https://github.com/prometheus/node_exporter/pull/540 which sounds like it should meet the requirements of this request. It collects entry counts on a per-device basis from /proc/net/arp
.
@skottler Thanks!
During last week I create a shell script to put the data in the text collector
:
#!/bin/bash
set -eu
: ${DIR:="/path/to/text-collector"}
LNSTAT="${DIR}/lnstat.prom"
lnstat -c 1 --json \
| sed -e 's|{{|{|' -e 's|}}|}|' \
| jq -r 'to_entries | .[] | "lnstat_\(.key)_\(.value | to_entries| .[] | "\(.key) \(.value)")"' \
> ${LNSTAT}.$$
mv ${LNSTAT}.$$ ${LNSTAT}
SYS="${DIR}/sys.prom"
for f in /proc/sys/net/ipv4/neigh/default/*; do
echo "sysctl_ipv4_neigh_default_$(basename $f) $(< $f)" \
>>${SYS}.$$
done
mv ${SYS}.$$ ${SYS}
In this script I added some data from /proc/sys/net/ipv4/neigh/default
because I realized that having the gc_thresh{1..3}
is needed to understand the behavior of the metrics collected by /proc/net/stat/arp_cache
.
So the question I have: is collecting the gc_*
values part of this Issue or not.
After re-reading: you wrote /proc/net/arp
. This file is more readable than /proc/net/stat/arp_cache
but I don't know what this data is.
I'm dealing with messages from the kernel telling me that my ARP table has been full. I have increased /proc/sys/net/ipv4/neigh/default/gc_thresh3
and I don't have the problem anymore for now. I assume I solved it. But as gc_thresh3
was at 2048, I don't understand why /proc/net/arp
has only some entries. At least lnstat -c 1
gives me a lot more.
It's like /proc/net/arp
is for my default environment, and /proc/net/stat/arp_cache
compiles all tre caches of all thje containers. Thus I think the second file is better.
IPv6 neighour tables and garbage collection thresholds should be considered as well.
Also the current arp collector does not expose the state (reachable|stale|failed|incomplete
) as a label.
@skottler Since you added the arp collector, what do you think about adding the state as label?
In general, contributions are welcome to add these things.
@discordianfish I'm happy to add the label, it'd be a nice change the ARP collector. We'd discussed in a previous pull request about potentially putting the parsing code for /proc/net/arp
into the procfs
library and I submitted an initial version of that in https://github.com/prometheus/procfs/pull/105. Once that lands I will plan to switch over node_exporter
to use it and then follow up by expanding it with better state support.
How does that sound?
@skottler That sounds perfect, thanks!
IPv6 neighour tables and garbage collection thresholds should be considered as well.
Also the current arp collector does not expose the state (
reachable|stale|failed|incomplete
) as a label.
Any update on adding these labels to arp collection? :)
May I pick this up for adding state labels?
EDIT: Made a small PR. Maybe we can talk about NDP collector, I'm also wiling to implement it, if community wants to add ndp collector.
Need: The node_exporter doesn't provide statistics about neighbors (the ip neighbor's way): numbers, table overload, garbade collections, limts,...
More info: The neighbors can be found via
ip neigh show
. The raw data is in/proc/net/stat/arp_cache
and/proc/net/stat/ndisc_cache
(based on that see "struct neigh_table").Use case: I'm working with 5 hosts, more than 120 containers and I discovered in a kernel log file that I have an ARP table overload for the second time. The direct impact is a loss of packet and thus this is a root cause hard to find.