prometheus / procfs

procfs provides functions to retrieve system, kernel and process metrics from the pseudo-filesystem proc.
Apache License 2.0
754 stars 311 forks source link

Unknown NFSd metric line "wdeleg_getattr" on kernel 6.6-rc1 #567

Closed chilversc closed 7 months ago

chilversc commented 9 months ago

A new metric has been added to /proc/net/rpc/nfsd named wdeleg_getattr in the 6.6-rc1 kernel. This causes nfs.ParseServerRPCStats to fail with error unknown NFSd metric line "wdeleg_getattr".

For reference, the kernel source defines this metric as (where wdeleg stands for "write delegation"):

NFSD_STATS_WDELEG_GETATTR, /* count of getattr conflict with wdeleg */

For context, this is the commit that introduced the new metric; https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?h=v6.6-rc1&id=fd19ca36fd782b84f71b86525b91a905cda913a4.

There is a similar issue logged for the node exporter, https://github.com/prometheus/node_exporter/issues/2799.

Sample /proc/net/rpc/nfsd

rc 0 0 1330233
fh 0 0 0 0 0
io 519605149696 0
th 512 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ra 0 0 0 0 0 0 0 0 0 0 0 0
net 1330025 0 1329850 162
rpc 1329892 60 60 0 0
proc3 22 42 42456 0 40040 559 0 1247023 0 0 0 0 0 0 0 0 0 0 29 0 42 21 0
proc4 2 0 0
proc4ops 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
wdeleg_getattr 0

Forward compatability

This issue does raise a concern about forward compatability. I do not think that adding a new metric line, or new counters should break existing implementaions.

I would suggest at a minimum, if an unknown metric line is found, that line should just be ignored. This falls into a similar concept about being leanient when parsing file formats such as JSON.

Likewise should consider ignore the additional values if a metric contains more values than expected. Normally the kernel maintains backwards compatability for counters, so if a counter is removed then a placeholder (normaly zero) is left in its place to avoid breaking existing software. New counters are only added to the end of existing lines, or as new lines.

discordianfish commented 9 months ago

I agree, we should ignore and debug-log unknown lines.

chilversc commented 7 months ago

I can confirm that this is now fixed in v0.12.0 by https://github.com/prometheus/procfs/pull/574