Open ztnel opened 1 year ago
Hm, interesting. Which features of the exporter are you using? The exporter has three collectors: cpu
, gpu
and textfile
. For every collector, there are individual scrape metrics: rpi_scrape_{name}_collector_duration_seconds
and rpi_scrape_{name}_collector_success
. You could also run rpi_exporter --help
and from there use the correct flags to disable some collectors and see if this solves your problem. E.g. the gpu
collector needs the correct vcgencmd
set, which might not be the case by default: https://github.com/lukasmalkmus/rpi_exporter/blob/master/collector/gpu.go#L27-L31.
Unfortunately, the README isn't quite up-to-date and this repo probably deserves a proper makeover... But time is my enemy :) I refactored another exporter of mine quite recently so chances are not to bad I can get to this one, as well.
I dug up the manifest for the arm-exporter daemonset. It looks like it's just running with default flags.
containers:
- command:
- /bin/rpi_exporter
- '--web.listen-address=127.0.0.1:9243'
image: 'carlosedp/arm_exporter:latest'
name: arm-exporter
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 50m
memory: 50Mi
securityContext:
privileged: true
- args:
- '--secure-listen-address=$(IP):9243'
- '--upstream=http://127.0.0.1:9243/'
- >-
--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
env:
- name: IP
valueFrom:
fieldRef:
fieldPath: status.podIP
Im not super familiar with Kubernetes manifests but how does the container access vcgencmd
metrics on the Pi? I assume there is some kind of volume mount in play? When I find my vcgencmd
on one of my nodes I do not get the default /opt/vc/bin/vcgencmd
:
HypriotOS/armv7: node@node0 in ~
$ which vcgencmd
/usr/bin/vcgencmd
I think I have an idea now. The pod CPU usage metrics use the container_cpu_usage_seconds_total
metric which I think is relative to the resource limit set in the daemonset in this case 100m
(0.1). which is relatively small. If I take my node CPU usage graph and put it inline with the Pod spa usage graph I can see that these spikes in the pods only correspond to roughly 25% CPU usage on the node:
I think it's still pretty large for an exporter service. Not sure if you have any benchmarks available to profile the container runtime.
[...] Not sure if you have any benchmarks available to profile the container runtime.
Unfortunately, no. I think you should play around with turning different collectors and setting the correct path for vcgencmd
and see how that affects the metrics to pin down the possible problem. For playing with the individual collectors, you don't even need to mess with the rpi_exporter
exporter config. Just like Node Exporter, you can tweak which collectors to enable via the Prometheus config: https://github.com/prometheus/node_exporter#filtering-enabled-collectors.
Hi, I wanted to see if there was any insight into why the
arm-exporter
service causing periodic spikes in CPU usage. Below is a screenshot from my grafana instance of k3s deployment where I am filtering the pods by those runningarm-exporter
:My prometheus scrape interval is set to 30s and I can see some spikes are registering peak values 2 data points in a row which means these usage spikes can be happening for over 30s each:
Pod Details:
System Details: RPi CM3B+ Compute Modules 32 Bit Hypriot OS Version 1.12.3 (Docker 19.03.12, kernel 4.19.97)
Any insight would be appreciated.