lovoo / ipmi_exporter

IPMI Exporter for prometheus.io, written in Go.
BSD 3-Clause "New" or "Revised" License
80 stars 32 forks source link

Docker: run without privileged #9

Closed daenney closed 7 years ago

daenney commented 7 years ago

I was looking into running without privileged b/c I'm not really a fan of that and figured that as long as I mount /dev/ipmi0 into the container at the same spot that should be enough.

However, when I do that I see:

level=error msg="error while calling ipmitool: exit status 1" source="collector.go:50"
level=error msg="exit status 1" source="collector.go:141"
level=fatal msg="ipmitool didn't return any metrics (collector has no descriptors)" source="main.go:22"

Any idea what I might've missed?

thomersch commented 7 years ago

I was looking into running without privileged b/c I'm not really a fan of that and figured that as long as I mount /dev/ipmi0 into the container at the same spot that should be enough.

Good idea!

level=error msg="error while calling ipmitool: exit status 1" source="collector.go:50"

This looks like a problem with executing the ipmitool itself. Could you run ipmitool sensor and post the output?

daenney commented 7 years ago

So running it on the host I get

daenney@elsa:~$ sudo ipmitool sensor
Pwr Unit Status  | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
IPMI Watchdog    | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
Physical Scrty   | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
SMI Timeout      | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
System Event Log | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
System Event     | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
Button           | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
VR Watchdog      | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
SSB Therm Trip   | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
BMC FW Health    | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
System Airflow   | 0.000      | CFM        | ok    | na        | na        | na        | na        | na        | na
BB EDGE Temp     | 36.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 110.000   | 115.000   | na
SSB Temp         | 54.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 98.000    | 103.000   | na
BB BMC Temp      | 51.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 110.000   | 115.000   | na
BB P2 VR Temp    | 36.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 110.000   | 115.000   | na
BB MEM VR Temp   | 41.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 110.000   | 115.000   | na
LAN NIC Temp     | 63.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 115.000   | 120.000   | na
System Fan 4     | 686.000    | RPM        | ok    | na        | 294.000   | 392.000   | na        | na        | na
P1 Status        | 0x0        | discrete   | 0x8000| na        | na        | na        | na        | na        | na
P2 Status        | 0x0        | discrete   | 0x8000| na        | na        | na        | na        | na        | na
P1 Therm Margin  | -61.000    | degrees C  | ok    | na        | na        | na        | na        | na        | na
P2 Therm Margin  | -60.000    | degrees C  | ok    | na        | na        | na        | na        | na        | na
P1 Therm Ctrl %  | 0.000      | percent    | ok    | na        | na        | na        | 30.000    | 50.000    | na
P2 Therm Ctrl %  | 0.000      | percent    | ok    | na        | na        | na        | 30.000    | 50.000    | na
P1 ERR2          | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P2 ERR2          | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
CATERR           | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 MSID Mismatch | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
CPU Missing      | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 DTS Therm Mgn | -61.000    | degrees C  | ok    | na        | na        | na        | na        | na        | na
P2 DTS Therm Mgn | -60.000    | degrees C  | ok    | na        | na        | na        | na        | na        | na
P2 MSID Mismatch | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 VRD Hot       | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P2 VRD Hot       | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 MEM01 VRD Hot | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 MEM23 VRD Hot | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P2 MEM01 VRD Hot | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P2 MEM23 VRD Hot | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
DIMM Thrm Mrgn 1 | -43.000    | degrees C  | ok    | na        | na        | na        | 5.000     | 10.000    | na
DIMM Thrm Mrgn 2 | -43.000    | degrees C  | ok    | na        | na        | na        | 5.000     | 10.000    | na
DIMM Thrm Mrgn 3 | -43.000    | degrees C  | ok    | na        | na        | na        | 5.000     | 10.000    | na
DIMM Thrm Mrgn 4 | -53.000    | degrees C  | ok    | na        | na        | na        | 5.000     | 10.000    | na
Mem P1 Thrm Trip | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
Mem P2 Thrm Trip | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
BB +12.0V        | 12.039     | Volts      | ok    | na        | 10.635    | 10.947    | 13.027    | 13.391    | na
BB +5.0V         | 4.937      | Volts      | ok    | na        | 4.460     | 4.590     | 5.415     | 5.566     | na
BB +3.3V         | 3.268      | Volts      | ok    | na        | 2.953     | 3.039     | 3.554     | 3.654     | na
BB +5.0V STBY    | 5.046      | Volts      | ok    | na        | 4.460     | 4.590     | 5.415     | 5.566     | na
BB +3.3V AUX     | 3.296      | Volts      | ok    | na        | 2.953     | 3.039     | 3.554     | 3.654     | na
BB +1.05V P1Vccp | 0.792      | Volts      | ok    | na        | 0.546     | 0.564     | 1.464     | 1.506     | na
BB +1.05V P2Vccp | 0.822      | Volts      | ok    | na        | 0.546     | 0.564     | 1.464     | 1.506     | na
BB +1.5 P1DDR AB | 1.495      | Volts      | ok    | na        | 1.339     | 1.387     | 1.611     | 1.659     | na
BB +1.5 P1DDR CD | 1.509      | Volts      | ok    | na        | 1.339     | 1.387     | 1.611     | 1.659     | na
BB +1.5 P2DDR AB | 1.509      | Volts      | ok    | na        | 1.339     | 1.387     | 1.611     | 1.659     | na
BB +1.5 P2DDR CD | 1.509      | Volts      | ok    | na        | 1.339     | 1.387     | 1.611     | 1.659     | na
BB +1.8V AUX     | 1.794      | Volts      | ok    | na        | 1.644     | 1.702     | 1.902     | 1.960     | na
BB +1.1V STBY    | 1.076      | Volts      | ok    | na        | 0.938     | 0.964     | 1.240     | 1.276     | na
BB VBAT          | 3.018      | Volts      | ok    | na        | 2.211     | 2.544     | na        | na        | na
BB +1.35 P1LV AB | na         |            | na    | na        | 1.201     | 1.244     | 1.445     | 1.488     | na
BB +1.35 P1LV CD | na         |            | na    | na        | 1.201     | 1.244     | 1.445     | 1.488     | na
BB +1.35 P2LV AB | na         |            | na    | na        | 1.201     | 1.244     | 1.445     | 1.488     | na
BB +1.35 P2LV CD | na         |            | na    | na        | 1.201     | 1.244     | 1.445     | 1.488     | na
NM Capabilities  | 0x97       | discrete   | 0x0100| na        | na        | na        | na        | na        | na
P1 MTT           | 0.000      | percent    | ok    | 0.000     | 0.000     | 0.000     | 0.000     | 0.000     | 0.000
P2 MTT           | 0.000      | percent    | ok    | 0.000     | 0.000     | 0.000     | 0.000     | 0.000     | 0.000

daenney@elsa:~$ ls -lah /dev/ipmi0
crw------- 1 root root 245, 0 okt 22 20:08 /dev/ipmi0

I haven't been able to run it from the container just yet, need to change the Dockerfile so I have a shell.

daenney commented 7 years ago

Ah, managed to get a shell with just sh:

/ # ipmitool sensor
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
/ # ls /dev/
console  core     fd       full     fuse     ipmi0    mqueue   null     ptmx     pts      random   shm      stderr   stdin    stdout   tty      urandom  zero
/ # ls -lah /dev/
total 4
drwxr-xr-x    5 root     root         400 Oct 24 14:22 .
drwxr-xr-x   31 root     root        4.0K Oct 24 14:22 ..
crw-------    1 root     root      136,   1 Oct 24 14:22 console
lrwxrwxrwx    1 root     root          11 Oct 24 14:22 core -> /proc/kcore
lrwxrwxrwx    1 root     root          13 Oct 24 14:22 fd -> /proc/self/fd
crw-rw-rw-    1 root     root        1,   7 Oct 24 14:22 full
crw-rw-rw-    1 root     root       10, 229 Oct 24 14:22 fuse
crw-------    1 root     root      245,   0 Oct 22 18:08 ipmi0
drwxrwxrwt    2 root     root          40 Oct 24 14:22 mqueue
crw-rw-rw-    1 root     root        1,   3 Oct 24 14:22 null
lrwxrwxrwx    1 root     root           8 Oct 24 14:22 ptmx -> pts/ptmx
drwxr-xr-x    2 root     root           0 Oct 24 14:22 pts
crw-rw-rw-    1 root     root        1,   8 Oct 24 14:22 random
drwxrwxrwt    2 root     root          40 Oct 24 14:22 shm
lrwxrwxrwx    1 root     root          15 Oct 24 14:22 stderr -> /proc/self/fd/2
lrwxrwxrwx    1 root     root          15 Oct 24 14:22 stdin -> /proc/self/fd/0
lrwxrwxrwx    1 root     root          15 Oct 24 14:22 stdout -> /proc/self/fd/1
crw-rw-rw-    1 root     root        5,   0 Oct 24 14:22 tty
crw-rw-rw-    1 root     root        1,   9 Oct 24 14:22 urandom
crw-rw-rw-    1 root     root        1,   5 Oct 24 14:22 zero

So the device is there, there's a /dev/ipmi0 but it seems ipmitool really doesn't like it.

daenney commented 7 years ago

If I run the container without -v /dev/ipmi0:/dev/ipmi0 I don't get an ipmi0 in the container at all, as expected.

thomersch commented 7 years ago

Well, that looks weird. The permissions seem alright.

I have no experience mounting hardware devices into the container, so I am not sure if I am able to help here.

daenney commented 7 years ago

Alright, I figured out the first part. You're not supposed to use volumes to mount /dev devices, you're supposed to use --device=/dev/ipmi0 or in docker-compose:

devices:
  - /dev/ipmi0:/dev/ipmi0

This is surprisingly obvious but hard to find in the docs 😥. Doing this allows you to run entirely without privileged mode so achievement 🔓ed.

Now running ipmitool sensor in the container works and I get:

/ # ipmitool sensor
Pwr Unit Status  | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
IPMI Watchdog    | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
Physical Scrty   | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
SMI Timeout      | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
System Event Log | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
System Event     | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
Button           | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
VR Watchdog      | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
SSB Therm Trip   | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
BMC FW Health    | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
System Airflow   | 0.000      | CFM        | ok    | na        | na        | na        | na        | na        | na
BB EDGE Temp     | 38.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 110.000   | 115.000   | na
SSB Temp         | 58.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 98.000    | 103.000   | na
BB BMC Temp      | 54.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 110.000   | 115.000   | na
BB P2 VR Temp    | 40.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 110.000   | 115.000   | na
BB MEM VR Temp   | 45.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 110.000   | 115.000   | na
LAN NIC Temp     | 66.000     | degrees C  | ok    | na        | 0.000     | 5.000     | 115.000   | 120.000   | na
System Fan 4     | 686.000    | RPM        | ok    | na        | 294.000   | 392.000   | na        | na        | na
P1 Status        | 0x0        | discrete   | 0x8000| na        | na        | na        | na        | na        | na
P2 Status        | 0x0        | discrete   | 0x8000| na        | na        | na        | na        | na        | na
P1 Therm Margin  | -60.000    | degrees C  | ok    | na        | na        | na        | na        | na        | na
P2 Therm Margin  | -58.000    | degrees C  | ok    | na        | na        | na        | na        | na        | na
P1 Therm Ctrl %  | 0.000      | percent    | ok    | na        | na        | na        | 30.000    | 50.000    | na
P2 Therm Ctrl %  | 0.000      | percent    | ok    | na        | na        | na        | 30.000    | 50.000    | na
P1 ERR2          | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P2 ERR2          | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
CATERR           | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 MSID Mismatch | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
CPU Missing      | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 DTS Therm Mgn | -60.000    | degrees C  | ok    | na        | na        | na        | na        | na        | na
P2 DTS Therm Mgn | -58.000    | degrees C  | ok    | na        | na        | na        | na        | na        | na
P2 MSID Mismatch | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 VRD Hot       | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P2 VRD Hot       | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 MEM01 VRD Hot | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P1 MEM23 VRD Hot | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P2 MEM01 VRD Hot | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
P2 MEM23 VRD Hot | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
DIMM Thrm Mrgn 1 | -42.000    | degrees C  | ok    | na        | na        | na        | 5.000     | 10.000    | na
DIMM Thrm Mrgn 2 | -39.000    | degrees C  | ok    | na        | na        | na        | 5.000     | 10.000    | na
DIMM Thrm Mrgn 3 | -44.000    | degrees C  | ok    | na        | na        | na        | 5.000     | 10.000    | na
DIMM Thrm Mrgn 4 | -46.000    | degrees C  | ok    | na        | na        | na        | 5.000     | 10.000    | na
Mem P1 Thrm Trip | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
Mem P2 Thrm Trip | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na
BB +12.0V        | 12.039     | Volts      | ok    | na        | 10.635    | 10.947    | 13.027    | 13.391    | na
BB +5.0V         | 4.959      | Volts      | ok    | na        | 4.460     | 4.590     | 5.415     | 5.566     | na
BB +3.3V         | 3.268      | Volts      | ok    | na        | 2.953     | 3.039     | 3.554     | 3.654     | na
BB +5.0V STBY    | 5.046      | Volts      | ok    | na        | 4.460     | 4.590     | 5.415     | 5.566     | na
BB +3.3V AUX     | 3.296      | Volts      | ok    | na        | 2.953     | 3.039     | 3.554     | 3.654     | na
BB +1.05V P1Vccp | 0.990      | Volts      | ok    | na        | 0.546     | 0.564     | 1.464     | 1.506     | na
BB +1.05V P2Vccp | 0.828      | Volts      | ok    | na        | 0.546     | 0.564     | 1.464     | 1.506     | na
BB +1.5 P1DDR AB | 1.495      | Volts      | ok    | na        | 1.339     | 1.387     | 1.611     | 1.659     | na
BB +1.5 P1DDR CD | 1.509      | Volts      | ok    | na        | 1.339     | 1.387     | 1.611     | 1.659     | na
BB +1.5 P2DDR AB | 1.509      | Volts      | ok    | na        | 1.339     | 1.387     | 1.611     | 1.659     | na
BB +1.5 P2DDR CD | 1.509      | Volts      | ok    | na        | 1.339     | 1.387     | 1.611     | 1.659     | na
BB +1.8V AUX     | 1.794      | Volts      | ok    | na        | 1.644     | 1.702     | 1.902     | 1.960     | na
BB +1.1V STBY    | 1.076      | Volts      | ok    | na        | 0.938     | 0.964     | 1.240     | 1.276     | na
BB VBAT          | 3.018      | Volts      | ok    | na        | 2.211     | 2.544     | na        | na        | na
BB +1.35 P1LV AB | na         |            | na    | na        | 1.201     | 1.244     | 1.445     | 1.488     | na
BB +1.35 P1LV CD | na         |            | na    | na        | 1.201     | 1.244     | 1.445     | 1.488     | na
BB +1.35 P2LV AB | na         |            | na    | na        | 1.201     | 1.244     | 1.445     | 1.488     | na
BB +1.35 P2LV CD | na         |            | na    | na        | 1.201     | 1.244     | 1.445     | 1.488     | na
NM Capabilities  | 0x6f       | discrete   | 0x0100| na        | na        | na        | na        | na        | na
P1 MTT           | 0.000      | percent    | ok    | 0.000     | 0.000     | 0.000     | 0.000     | 0.000     | 0.000
P2 MTT           | 0.000      | percent    | ok    | 0.000     | 0.000     | 0.000     | 0.000     | 0.000     | 0.000

However, the IPMI exporter container still crashes:

level=fatal msg="ipmitool didn't return any metrics (descriptor Desc{fqName: \"ipmi_p2_therm_ctrl_%\", help: \"p2_therm_ctrl_%\", constLabels: {}, variableLabels: [addr]} is invalid: \"ipmi_p2_therm_ctrl_%\" is not a valid metric name)" source="main.go:22"

Any idea what's going on there?

thomersch commented 7 years ago

Good to see the mount working, well done! We probably should change the documentation to your approach.

"ipmi_p2_therm_ctrl_%" is not a valid metric name

Your ipmi output has more metrics than the one I worked with, notably P1 Therm Ctrl %. On the label branch I've worked on the categorisation of the metrics. As far as I see, the current state should just skip the metric instead of crashing. Could you please try the code from the label branch?

If that works, we can include the "thermal ctrl" metric into the collector later.

daenney commented 7 years ago

A quick fix would be to strings.replace(variable, "%", "pct", -1). Probably something similar for the + sign?

daenney commented 7 years ago

I've raised #10 in the mean time to update the README.

daenney commented 7 years ago

Ah, you already replace + with p: https://github.com/lovoo/ipmi_exporter/blob/master/collector.go#L83. I've updated it to deal with the % sign in #11.

daenney commented 7 years ago

Could anyone take a look at the two PRs?

daenney commented 7 years ago

Awesome. with #10 and #11 merged all my issues have been resolved. Thank you!