Closed daenney closed 7 years ago
I was looking into running without privileged b/c I'm not really a fan of that and figured that as long as I mount /dev/ipmi0 into the container at the same spot that should be enough.
Good idea!
level=error msg="error while calling ipmitool: exit status 1" source="collector.go:50"
This looks like a problem with executing the ipmitool itself. Could you run ipmitool sensor
and post the output?
So running it on the host I get
daenney@elsa:~$ sudo ipmitool sensor
Pwr Unit Status | 0x0 | discrete | 0x0000| na | na | na | na | na | na
IPMI Watchdog | 0x0 | discrete | 0x0000| na | na | na | na | na | na
Physical Scrty | 0x0 | discrete | 0x0000| na | na | na | na | na | na
SMI Timeout | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Event Log | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Event | 0x0 | discrete | 0x0000| na | na | na | na | na | na
Button | 0x0 | discrete | 0x0000| na | na | na | na | na | na
VR Watchdog | 0x0 | discrete | 0x0000| na | na | na | na | na | na
SSB Therm Trip | 0x0 | discrete | 0x0000| na | na | na | na | na | na
BMC FW Health | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Airflow | 0.000 | CFM | ok | na | na | na | na | na | na
BB EDGE Temp | 36.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
SSB Temp | 54.000 | degrees C | ok | na | 0.000 | 5.000 | 98.000 | 103.000 | na
BB BMC Temp | 51.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
BB P2 VR Temp | 36.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
BB MEM VR Temp | 41.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
LAN NIC Temp | 63.000 | degrees C | ok | na | 0.000 | 5.000 | 115.000 | 120.000 | na
System Fan 4 | 686.000 | RPM | ok | na | 294.000 | 392.000 | na | na | na
P1 Status | 0x0 | discrete | 0x8000| na | na | na | na | na | na
P2 Status | 0x0 | discrete | 0x8000| na | na | na | na | na | na
P1 Therm Margin | -61.000 | degrees C | ok | na | na | na | na | na | na
P2 Therm Margin | -60.000 | degrees C | ok | na | na | na | na | na | na
P1 Therm Ctrl % | 0.000 | percent | ok | na | na | na | 30.000 | 50.000 | na
P2 Therm Ctrl % | 0.000 | percent | ok | na | na | na | 30.000 | 50.000 | na
P1 ERR2 | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P2 ERR2 | 0x0 | discrete | 0x0000| na | na | na | na | na | na
CATERR | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 MSID Mismatch | 0x0 | discrete | 0x0000| na | na | na | na | na | na
CPU Missing | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 DTS Therm Mgn | -61.000 | degrees C | ok | na | na | na | na | na | na
P2 DTS Therm Mgn | -60.000 | degrees C | ok | na | na | na | na | na | na
P2 MSID Mismatch | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P2 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 MEM01 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 MEM23 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P2 MEM01 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P2 MEM23 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
DIMM Thrm Mrgn 1 | -43.000 | degrees C | ok | na | na | na | 5.000 | 10.000 | na
DIMM Thrm Mrgn 2 | -43.000 | degrees C | ok | na | na | na | 5.000 | 10.000 | na
DIMM Thrm Mrgn 3 | -43.000 | degrees C | ok | na | na | na | 5.000 | 10.000 | na
DIMM Thrm Mrgn 4 | -53.000 | degrees C | ok | na | na | na | 5.000 | 10.000 | na
Mem P1 Thrm Trip | 0x0 | discrete | 0x0000| na | na | na | na | na | na
Mem P2 Thrm Trip | 0x0 | discrete | 0x0000| na | na | na | na | na | na
BB +12.0V | 12.039 | Volts | ok | na | 10.635 | 10.947 | 13.027 | 13.391 | na
BB +5.0V | 4.937 | Volts | ok | na | 4.460 | 4.590 | 5.415 | 5.566 | na
BB +3.3V | 3.268 | Volts | ok | na | 2.953 | 3.039 | 3.554 | 3.654 | na
BB +5.0V STBY | 5.046 | Volts | ok | na | 4.460 | 4.590 | 5.415 | 5.566 | na
BB +3.3V AUX | 3.296 | Volts | ok | na | 2.953 | 3.039 | 3.554 | 3.654 | na
BB +1.05V P1Vccp | 0.792 | Volts | ok | na | 0.546 | 0.564 | 1.464 | 1.506 | na
BB +1.05V P2Vccp | 0.822 | Volts | ok | na | 0.546 | 0.564 | 1.464 | 1.506 | na
BB +1.5 P1DDR AB | 1.495 | Volts | ok | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB +1.5 P1DDR CD | 1.509 | Volts | ok | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB +1.5 P2DDR AB | 1.509 | Volts | ok | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB +1.5 P2DDR CD | 1.509 | Volts | ok | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB +1.8V AUX | 1.794 | Volts | ok | na | 1.644 | 1.702 | 1.902 | 1.960 | na
BB +1.1V STBY | 1.076 | Volts | ok | na | 0.938 | 0.964 | 1.240 | 1.276 | na
BB VBAT | 3.018 | Volts | ok | na | 2.211 | 2.544 | na | na | na
BB +1.35 P1LV AB | na | | na | na | 1.201 | 1.244 | 1.445 | 1.488 | na
BB +1.35 P1LV CD | na | | na | na | 1.201 | 1.244 | 1.445 | 1.488 | na
BB +1.35 P2LV AB | na | | na | na | 1.201 | 1.244 | 1.445 | 1.488 | na
BB +1.35 P2LV CD | na | | na | na | 1.201 | 1.244 | 1.445 | 1.488 | na
NM Capabilities | 0x97 | discrete | 0x0100| na | na | na | na | na | na
P1 MTT | 0.000 | percent | ok | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000
P2 MTT | 0.000 | percent | ok | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000
daenney@elsa:~$ ls -lah /dev/ipmi0
crw------- 1 root root 245, 0 okt 22 20:08 /dev/ipmi0
I haven't been able to run it from the container just yet, need to change the Dockerfile so I have a shell.
Ah, managed to get a shell with just sh
:
/ # ipmitool sensor
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
/ # ls /dev/
console core fd full fuse ipmi0 mqueue null ptmx pts random shm stderr stdin stdout tty urandom zero
/ # ls -lah /dev/
total 4
drwxr-xr-x 5 root root 400 Oct 24 14:22 .
drwxr-xr-x 31 root root 4.0K Oct 24 14:22 ..
crw------- 1 root root 136, 1 Oct 24 14:22 console
lrwxrwxrwx 1 root root 11 Oct 24 14:22 core -> /proc/kcore
lrwxrwxrwx 1 root root 13 Oct 24 14:22 fd -> /proc/self/fd
crw-rw-rw- 1 root root 1, 7 Oct 24 14:22 full
crw-rw-rw- 1 root root 10, 229 Oct 24 14:22 fuse
crw------- 1 root root 245, 0 Oct 22 18:08 ipmi0
drwxrwxrwt 2 root root 40 Oct 24 14:22 mqueue
crw-rw-rw- 1 root root 1, 3 Oct 24 14:22 null
lrwxrwxrwx 1 root root 8 Oct 24 14:22 ptmx -> pts/ptmx
drwxr-xr-x 2 root root 0 Oct 24 14:22 pts
crw-rw-rw- 1 root root 1, 8 Oct 24 14:22 random
drwxrwxrwt 2 root root 40 Oct 24 14:22 shm
lrwxrwxrwx 1 root root 15 Oct 24 14:22 stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root root 15 Oct 24 14:22 stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root root 15 Oct 24 14:22 stdout -> /proc/self/fd/1
crw-rw-rw- 1 root root 5, 0 Oct 24 14:22 tty
crw-rw-rw- 1 root root 1, 9 Oct 24 14:22 urandom
crw-rw-rw- 1 root root 1, 5 Oct 24 14:22 zero
So the device is there, there's a /dev/ipmi0
but it seems ipmitool really doesn't like it.
If I run the container without -v /dev/ipmi0:/dev/ipmi0
I don't get an ipmi0
in the container at all, as expected.
Well, that looks weird. The permissions seem alright.
I have no experience mounting hardware devices into the container, so I am not sure if I am able to help here.
Alright, I figured out the first part. You're not supposed to use volumes to mount /dev devices, you're supposed to use --device=/dev/ipmi0
or in docker-compose:
devices:
- /dev/ipmi0:/dev/ipmi0
This is surprisingly obvious but hard to find in the docs 😥. Doing this allows you to run entirely without privileged mode so achievement 🔓ed.
Now running ipmitool sensor
in the container works and I get:
/ # ipmitool sensor
Pwr Unit Status | 0x0 | discrete | 0x0000| na | na | na | na | na | na
IPMI Watchdog | 0x0 | discrete | 0x0000| na | na | na | na | na | na
Physical Scrty | 0x0 | discrete | 0x0000| na | na | na | na | na | na
SMI Timeout | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Event Log | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Event | 0x0 | discrete | 0x0000| na | na | na | na | na | na
Button | 0x0 | discrete | 0x0000| na | na | na | na | na | na
VR Watchdog | 0x0 | discrete | 0x0000| na | na | na | na | na | na
SSB Therm Trip | 0x0 | discrete | 0x0000| na | na | na | na | na | na
BMC FW Health | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Airflow | 0.000 | CFM | ok | na | na | na | na | na | na
BB EDGE Temp | 38.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
SSB Temp | 58.000 | degrees C | ok | na | 0.000 | 5.000 | 98.000 | 103.000 | na
BB BMC Temp | 54.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
BB P2 VR Temp | 40.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
BB MEM VR Temp | 45.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
LAN NIC Temp | 66.000 | degrees C | ok | na | 0.000 | 5.000 | 115.000 | 120.000 | na
System Fan 4 | 686.000 | RPM | ok | na | 294.000 | 392.000 | na | na | na
P1 Status | 0x0 | discrete | 0x8000| na | na | na | na | na | na
P2 Status | 0x0 | discrete | 0x8000| na | na | na | na | na | na
P1 Therm Margin | -60.000 | degrees C | ok | na | na | na | na | na | na
P2 Therm Margin | -58.000 | degrees C | ok | na | na | na | na | na | na
P1 Therm Ctrl % | 0.000 | percent | ok | na | na | na | 30.000 | 50.000 | na
P2 Therm Ctrl % | 0.000 | percent | ok | na | na | na | 30.000 | 50.000 | na
P1 ERR2 | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P2 ERR2 | 0x0 | discrete | 0x0000| na | na | na | na | na | na
CATERR | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 MSID Mismatch | 0x0 | discrete | 0x0000| na | na | na | na | na | na
CPU Missing | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 DTS Therm Mgn | -60.000 | degrees C | ok | na | na | na | na | na | na
P2 DTS Therm Mgn | -58.000 | degrees C | ok | na | na | na | na | na | na
P2 MSID Mismatch | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P2 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 MEM01 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 MEM23 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P2 MEM01 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P2 MEM23 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
DIMM Thrm Mrgn 1 | -42.000 | degrees C | ok | na | na | na | 5.000 | 10.000 | na
DIMM Thrm Mrgn 2 | -39.000 | degrees C | ok | na | na | na | 5.000 | 10.000 | na
DIMM Thrm Mrgn 3 | -44.000 | degrees C | ok | na | na | na | 5.000 | 10.000 | na
DIMM Thrm Mrgn 4 | -46.000 | degrees C | ok | na | na | na | 5.000 | 10.000 | na
Mem P1 Thrm Trip | 0x0 | discrete | 0x0000| na | na | na | na | na | na
Mem P2 Thrm Trip | 0x0 | discrete | 0x0000| na | na | na | na | na | na
BB +12.0V | 12.039 | Volts | ok | na | 10.635 | 10.947 | 13.027 | 13.391 | na
BB +5.0V | 4.959 | Volts | ok | na | 4.460 | 4.590 | 5.415 | 5.566 | na
BB +3.3V | 3.268 | Volts | ok | na | 2.953 | 3.039 | 3.554 | 3.654 | na
BB +5.0V STBY | 5.046 | Volts | ok | na | 4.460 | 4.590 | 5.415 | 5.566 | na
BB +3.3V AUX | 3.296 | Volts | ok | na | 2.953 | 3.039 | 3.554 | 3.654 | na
BB +1.05V P1Vccp | 0.990 | Volts | ok | na | 0.546 | 0.564 | 1.464 | 1.506 | na
BB +1.05V P2Vccp | 0.828 | Volts | ok | na | 0.546 | 0.564 | 1.464 | 1.506 | na
BB +1.5 P1DDR AB | 1.495 | Volts | ok | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB +1.5 P1DDR CD | 1.509 | Volts | ok | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB +1.5 P2DDR AB | 1.509 | Volts | ok | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB +1.5 P2DDR CD | 1.509 | Volts | ok | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB +1.8V AUX | 1.794 | Volts | ok | na | 1.644 | 1.702 | 1.902 | 1.960 | na
BB +1.1V STBY | 1.076 | Volts | ok | na | 0.938 | 0.964 | 1.240 | 1.276 | na
BB VBAT | 3.018 | Volts | ok | na | 2.211 | 2.544 | na | na | na
BB +1.35 P1LV AB | na | | na | na | 1.201 | 1.244 | 1.445 | 1.488 | na
BB +1.35 P1LV CD | na | | na | na | 1.201 | 1.244 | 1.445 | 1.488 | na
BB +1.35 P2LV AB | na | | na | na | 1.201 | 1.244 | 1.445 | 1.488 | na
BB +1.35 P2LV CD | na | | na | na | 1.201 | 1.244 | 1.445 | 1.488 | na
NM Capabilities | 0x6f | discrete | 0x0100| na | na | na | na | na | na
P1 MTT | 0.000 | percent | ok | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000
P2 MTT | 0.000 | percent | ok | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000
However, the IPMI exporter container still crashes:
level=fatal msg="ipmitool didn't return any metrics (descriptor Desc{fqName: \"ipmi_p2_therm_ctrl_%\", help: \"p2_therm_ctrl_%\", constLabels: {}, variableLabels: [addr]} is invalid: \"ipmi_p2_therm_ctrl_%\" is not a valid metric name)" source="main.go:22"
Any idea what's going on there?
Good to see the mount working, well done! We probably should change the documentation to your approach.
"ipmi_p2_therm_ctrl_%" is not a valid metric name
Your ipmi output has more metrics than the one I worked with, notably P1 Therm Ctrl %
. On the label
branch I've worked on the categorisation of the metrics. As far as I see, the current state should just skip the metric instead of crashing. Could you please try the code from the label
branch?
If that works, we can include the "thermal ctrl" metric into the collector later.
A quick fix would be to strings.replace(variable, "%", "pct", -1)
. Probably something similar for the +
sign?
I've raised #10 in the mean time to update the README.
Ah, you already replace +
with p
: https://github.com/lovoo/ipmi_exporter/blob/master/collector.go#L83. I've updated it to deal with the %
sign in #11.
Could anyone take a look at the two PRs?
Awesome. with #10 and #11 merged all my issues have been resolved. Thank you!
I was looking into running without privileged b/c I'm not really a fan of that and figured that as long as I mount
/dev/ipmi0
into the container at the same spot that should be enough.However, when I do that I see:
Any idea what I might've missed?