`bpfland`: scheduler failure during CPU hotplug operations

AboorvaDevarajan commented 3 months ago

System Info:

# lscpu
Architecture:                         ppc64le
Byte Order:                           Little Endian
CPU(s):                               128
On-line CPU(s) list:                  0,1,109-127
Off-line CPU(s) list:                 2-108
Thread(s) per core:                   3
Core(s) per socket:                   3
Socket(s):                            2
NUMA node(s):                         8
Model:                                2.3 (pvr 004e 1203)
Model name:                           POWER9, altivec supported
Frequency boost:                      enabled
CPU max MHz:                          3800.0000
CPU min MHz:                          2300.0000
L1d cache:                            192 KiB
L1i cache:                            192 KiB
L2 cache:                             3 MiB
L3 cache:                             60 MiB
NUMA node0 CPU(s):                    0,1
NUMA node8 CPU(s):                    109-127

Kernel Version: 6.10.0-rc2+ + struct_ops patches on ppc64le.

SCX Version: Latest upstream

Steps to recreate the issue:

Run the bpfland scheduler.
Execute the following command to stress the CPUs: stress-ng --cpu=100
Offline CPUs sequentially from 2 to 127: for i in {2..127}; do echo 0 > /sys/devices/system/cpu/cpu$i/online; done

During the process of offlining CPUs, the system successfully unregisters some CPUs without issues. However, it occasionally encounters the following error:

Failed to build host topology: Failed to open or read file /sys/devices/system/cpu/cpu82/topology/core_id.

When a CPU is offlined, its associated topology information in the sysfs is also unregistered and removed. However, the scheduler still tries to access this topology file even after it has been offlined which leads to this failure.

Error Output

EXIT: Scheduler unregistered from the main kernel (cpu 53 going offline, exiting scheduler)
10:28:02 [INFO] Unregister scx_bpfland scheduler
10:28:03 [INFO] SMT scheduling on
10:28:04 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 58 going offline, exiting scheduler)
10:28:04 [INFO] Unregister scx_bpfland scheduler
10:28:04 [INFO] SMT scheduling on
10:28:06 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 63 going offline, exiting scheduler)
10:28:06 [INFO] Unregister scx_bpfland scheduler
10:28:06 [INFO] SMT scheduling on
10:28:07 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 68 going offline, exiting scheduler)
10:28:07 [INFO] Unregister scx_bpfland scheduler
10:28:07 [INFO] SMT scheduling on
10:28:09 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 73 going offline, exiting scheduler)
10:28:09 [INFO] Unregister scx_bpfland scheduler
10:28:09 [INFO] SMT scheduling on
10:28:11 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 78 going offline, exiting scheduler)
10:28:11 [INFO] Unregister scx_bpfland scheduler
thread 'main' panicked at src/main.rs:145:36:
Failed to build host topology: Failed to open or read file "/sys/devices/system/cpu/cpu82/topology/core_id"

Stack backtrace:

   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/backtrace.rs:331:13
   3: anyhow::error::<impl anyhow::Error>::msg
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.86/src/error.rs:83:36
   4: scx_utils::topology::read_file_usize
             at ./rust/scx_utils/src/topology.rs:331:13
   5: scx_utils::topology::create_insert_cpu
             at ./rust/scx_utils/src/topology.rs:381:19
   6: scx_utils::topology::create_numa_nodes
             at ./rust/scx_utils/src/topology.rs:502:13
   7: scx_utils::topology::Topology::new
             at ./rust/scx_utils/src/topology.rs:201:13
   8: scx_bpfland::Scheduler::init
             at ./scheds/rust/scx_bpfland/src/main.rs:145:20
   9: scx_bpfland::main
             at ./scheds/rust/scx_bpfland/src/main.rs:279:25
  10: core::ops::function::FnOnce::call_once
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/ops/function.rs:250:5
  11: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:154:18
  12: std::rt::lang_start::{{closure}}
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:166:18
  13: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/ops/function.rs:284:13
  14: std::panicking::try::do_call
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:504:40
  15: std::panicking::try
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:468:19
  16: std::panic::catch_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panic.rs:142:14
  17: std::rt::lang_start_internal::{{closure}}
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:148:48
  18: std::panicking::try::do_call
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:504:40
  19: std::panicking::try
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:468:19
  20: std::panic::catch_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panic.rs:142:14
  21: std::rt::lang_start_internal
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:148:20
  22: std::rt::lang_start
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:165:17
  23: main
  24: generic_start_main
             at /build/glibc-GVyp00/glibc-2.31/csu/../csu/libc-start.c:308:16
  25: __libc_start_main
             at /build/glibc-GVyp00/glibc-2.31/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:98:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
   1: core::panicking::panic_fmt
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/result.rs:1652:5
   3: core::result::Result<T,E>::expect
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/result.rs:1034:23
   4: scx_bpfland::Scheduler::init
             at ./scheds/rust/scx_bpfland/src/main.rs:145:20
   5: scx_bpfland::main
             at ./scheds/rust/scx_bpfland/src/main.rs:279:25
   6: core::ops::function::FnOnce::call_once
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

arighi commented 3 months ago

Thanks for reporting this @AboorvaDevarajan , I'm actually working on the CPU hotplugging support right now, I should be able to push a fix soon. Also, interesting to notice that you're using sched_ext on ppc64le!

arighi commented 3 months ago

@AboorvaDevarajan this should be fixed in #408. I've temporarily dropped the dependency on the Topology crate, because there's still an issue that can happen with CPU hotplugging (but that's a separate one, I'll look at that later). In the meantime scx_bpfland should handle CPU hotplugging properly with #408 applied.

AboorvaDevarajan commented 3 months ago

Andrea, thanks for the fix. I tested the patch and it resolves the issue, bpfland scheduler now runs without crashing.

However, I noticed a kernel panic when the CPU hotplug is carried out when running the custom scheduler and the system becomes unresponsive. This issue also occurs with the simple scheduler as well, so it doesn't seem related to just bpfland. so I'll investigate this a bit more and report it upstream (kernel).

arighi commented 3 months ago

@AboorvaDevarajan yeah this looks more like a kernel bug, feel free to open another issue and let me know if you are able to get more details. Thanks!

sched-ext / scx

`bpfland`: scheduler failure during CPU hotplug operations #406