⚡ Energy consumption metrology agent. Let "scaph" dive and bring back the metrics that will help you make your systems and applications more sustainable !
Apache License 2.0
1.63k
stars
109
forks
source link
Random failure in rocky linux based custom container #380
I have build container with rpm based scaphandre installation. I am starting it on bare metal with 'prometheus --qemu" option. In docker logs I see:
scaphandre::sensors: Sysinfo sees 256
Scaphandre stdout exporter
Sending ⚡ metrics
Measurement step is: 2s
when I try to curl http://localhost:8080/metrics I don't get any output on console and in logs I see:
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/sensors/utils.rs:177:18
scaphandre::exporters::prometheus: Error in show_metrics : PoisonError { .. }
scaphandre::exporters::prometheus: Error details : poisoned lock: another task failed inside
each next run of curl produces:
scaphandre::exporters::prometheus: Error in show_metrics : PoisonError { .. }
scaphandre::exporters::prometheus: Error details : poisoned lock: another task failed inside
Once in a few runs scaphandre is starting properly and I am able to scrap metrics. I have done a lot of tests to determine when it happens (without changing ownership and access rights to /sys/clss/powercap (so without running init.sh), after reboot (to clean ownership of /sys), restarting docker container, purging docker, running scaphandre with stdout option, etc) and I didn't find anything conclusive.
Bellow there is console output where I run scaphandre few times before it works , after few unsuccessful attempts to start it.
(kolla-ansible) [stack@hpc30 ~]$ docker run -v /sys/class/powercap:/sys/class/powercap -v /proc:/proc -ti --network host -e RUST_BACKTRACE=full kolla/scaphandre:17.1.0 scaphandre stdout -t 5
scaphandre::sensors: Sysinfo sees 256
Scaphandre stdout exporter
Sending ⚡ metrics
Measurement step is: 2s
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/sensors/utils.rs:177:18
stack backtrace:
0: 0x5576c0d21f41 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hf66164b97344d0a2
1: 0x5576c0d4a4af - core::fmt::write::hbb74f2248ccd4395
2: 0x5576c0d1edb1 - std::io::Write::write_fmt::hed9c5edae1eac7b4
3: 0x5576c0d21d55 - std::sys_common::backtrace::print::hc9a6bb05c1f66b1d
4: 0x5576c0d231f7 - std::panicking::default_hook::{{closure}}::h617bee45ce760ff9
5: 0x5576c0d22fe4 - std::panicking::default_hook::hfb5619c23c95dafb
6: 0x5576c0d236ac - std::panicking::rust_panic_with_hook::h07253f826b957552
7: 0x5576c0d23561 - std::panicking::begin_panic_handler::{{closure}}::hfde4141a9de96c92
8: 0x5576c0d22376 - std::sys_common::backtrace::__rust_end_short_backtrace::he15cde744ac23f89
9: 0x5576c0d232f2 - rust_begin_unwind
10: 0x5576c06be443 - core::panicking::panic_fmt::h2494779393265ba8
11: 0x5576c06be4d3 - core::panicking::panic::hfcc79b23445abeb8
12: 0x5576c0799450 - scaphandre::exporters::MetricGenerator::gen_self_metrics::h280d657f7d304306
13: 0x5576c07a208b - scaphandre::exporters::MetricGenerator::gen_all_metrics::h63813309d030eccd
14: 0x5576c07b54a3 - scaphandre::exporters::stdout::StdoutExporter::iterate::h06a8bbbbab974fa2
15: 0x5576c07b52c8 - <scaphandre::exporters::stdout::StdoutExporter as scaphandre::exporters::Exporter>::run::hd0394d843640f8d2
16: 0x5576c06d6203 - scaphandre::main::h75d3d0458ba1b902
17: 0x5576c06cdfd3 - std::sys_common::backtrace::__rust_begin_short_backtrace::hd18dc57ef0d20d7c
18: 0x5576c06c9ad9 - std::rt::lang_start::{{closure}}::he293a497447ace7d
19: 0x5576c0d18ef5 - std::rt::lang_start_internal::he62005167fe2938d
20: 0x5576c06d9c95 - main
21: 0x7f1a5f2b4eb0 - __libc_start_call_main
22: 0x7f1a5f2b4f60 - __libc_start_main_alias_1
23: 0x5576c06bebf5 - _start
24: 0x0 - <unknown>
(kolla-ansible) [stack@hpc30 ~]$ docker run -v /sys/class/powercap:/sys/class/powercap -v /proc:/proc -ti --network host -e RUST_BACKTRACE=full kolla/scaphandre:17.1.0 scaphandre stdout -t 5
scaphandre::sensors: Sysinfo sees 256
Scaphandre stdout exporter
Sending ⚡ metrics
Measurement step is: 2s
scaphandre::sensors: Not enough records for socket
scaphandre::sensors: Not enough records for socket
Host: 0 W from
package core
Top 5 consumers:
Power PID Exe
No processes found yet or filter returns no value.
------------------------------------------------------------
Host: 167.52704 W from
package core
Socket1 83.300095 W | 0.123677 W
Socket0 85.29291 W | 0.137677 W
Top 5 consumers:
Power PID Exe
2.625001 W 295896 "/usr/bin/scaphandre"
0.0029199123 W 10613 ""
0.0029199123 W 10718 ""
0.0029199123 W 9711 ""
0.0029199123 W 4934 ""
------------------------------------------------------------
What is strange whenever I start scaphandre using official image it works just fine.
To Reproduce
Build image based on Rocky linux 9.3, install scaphandre rpm in it, and run it.
Expected behavior
Scpahandre prometheus --qemu will start properly each time.
Screenshots
n/a
Environment:
Rocky linux 9.3
uname -a
Linux hpc30 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Sep 16 09:55:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Additional context
Why I am building docker images instead using official one?
I want to add scpahandre support to openstack deployment project kolla-ansible.
This effort can be tracked here: https://review.opendev.org/c/openstack/kolla/+/914646/10
Bug description
I have build container with rpm based scaphandre installation. I am starting it on bare metal with 'prometheus --qemu" option. In docker logs I see:
when I try to
curl http://localhost:8080/metrics
I don't get any output on console and in logs I see:each next run of curl produces:
Once in a few runs scaphandre is starting properly and I am able to scrap metrics. I have done a lot of tests to determine when it happens (without changing ownership and access rights to /sys/clss/powercap (so without running init.sh), after reboot (to clean ownership of /sys), restarting docker container, purging docker, running scaphandre with stdout option, etc) and I didn't find anything conclusive.
Bellow there is console output where I run scaphandre few times before it works , after few unsuccessful attempts to start it.
What is strange whenever I start scaphandre using official image it works just fine.
To Reproduce
Build image based on Rocky linux 9.3, install scaphandre rpm in it, and run it.
Expected behavior
Scpahandre prometheus --qemu will start properly each time.
Screenshots
n/a
Environment:
Rocky linux 9.3
Additional context
Why I am building docker images instead using official one? I want to add scpahandre support to openstack deployment project kolla-ansible. This effort can be tracked here: https://review.opendev.org/c/openstack/kolla/+/914646/10