Closed aruhier closed 4 months ago
The only reason I can think of is lacking CAP_PERFMON
but I have no idea why that would be when it's run with sudo
. What does CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]}
say if you run it as root?
Thanks!
$ CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]}
0x00000000000000ca=cap_dac_override,cap_fowner,cap_setgid,cap_setuid
I don't know why but you don't have enough CAPs to load SCX schedulers. You'd need at least cap_bpf
and cap_perfmon
.
# CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]}
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Hmm, that is indeed weird.
I tried to manually add the capabilities to scx_simple
and I have the same issue:
$ getcap ./scx_simple
./scx_simple cap_perfmon,cap_bpf=ep
I don't know why but you don't have enough CAPs to load SCX schedulers. You'd need at least
cap_bpf
andcap_perfmon
.# CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]} 0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Sorry, from zsh it only shows a few capabilities, but checking them from bash looks ok:
$ CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]}
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
The schedulers still fail to load with the same error, ran from zsh, bash or the systemd service.
Pointed here the kernel config difference out: https://github.com/CachyOS/linux-cachyos/issues/254#issuecomment-2137389234
I do not think, this is an issue from the patchset itself, since it works on several users at CachyOS and there was not any report so far about not working scx_scheduler. You should check your kernel configs configuration and also your system configuration.
Indeed, I was missing CONFIG_FTRACE=y
, that is a dependency for CONFIG_BPF_LSM=y
. My config had CONFIG_BPF_LSM=y
but CONFIG_FTRACE=n
disabled it during compilation.
@htejun: in order to help people with custom config and avoid that kind of report, can you add a section in the README (or I can do a PR for it) specifying that the kernel must be compiled with CONFIG_BPF=y
, CONFIG_BPF_LSM=y
and CONFIG_BPF_SYSCALL=y
?
Thanks!
@htejun: in order to help people with custom config and avoid that kind of report, can you add a section in the README (or I can do a PR for it) specifying that the kernel must be compiled with
CONFIG_BPF=y
,CONFIG_BPF_LSM=y
andCONFIG_BPF_SYSCALL=y
?
Yes, please submit a PR. thanks!
Hi, Using CachyOS kernel (6.9.1) on Gentoo with libbpf 1.4.2, I can't load any scheduler of the release 1.9 or on
main
(c09bc2ac699c6bfdc7a9e3af976e509bb0326b69).Logs of scx_simple -v
``` $ sudo ./scx_simple -v libbpf: object 'scx_simple': failed (-95) to create BPF token from '/sys/fs/bpf', skipping optional step... libbpf: loaded kernel BTF from '/sys/kernel/btf/vmlinux' libbpf: extern (func ksym) 'scx_bpf_consume': resolved to vmlinux [52985] libbpf: extern (func ksym) 'scx_bpf_create_dsq': resolved to vmlinux [52992] libbpf: extern (func ksym) 'scx_bpf_dispatch': resolved to vmlinux [52995] libbpf: extern (func ksym) 'scx_bpf_dispatch_vtime': resolved to vmlinux [52999] libbpf: extern (func ksym) 'scx_bpf_select_cpu_dfl': resolved to vmlinux [53024] libbpf: extern 'scx_bpf_switch_all' (weak): not resolved, defaulting to zero libbpf: struct_ops init_kern simple_ops: type_id:419 kern_type_id:50305 kern_vtype_id:50386 libbpf: struct_ops init_kern simple_ops: func ptr select_cpu is set to prog simple_select_cpu from data(+0) to kern_data(+0) libbpf: struct_ops init_kern simple_ops: func ptr enqueue is set to prog simple_enqueue from data(+8) to kern_data(+8) libbpf: struct_ops init_kern simple_ops: func ptr dispatch is set to prog simple_dispatch from data(+24) to kern_data(+24) libbpf: struct_ops init_kern simple_ops: func ptr running is set to prog simple_running from data(+48) to kern_data(+48) libbpf: struct_ops init_kern simple_ops: func ptr stopping is set to prog simple_stopping from data(+56) to kern_data(+56) libbpf: struct_ops init_kern simple_ops: func ptr enable is set to prog simple_enable from data(+144) to kern_data(+144) libbpf: struct_ops init_kern simple_ops: func ptr init is set to prog simple_init from data(+248) to kern_data(+248) libbpf: struct_ops init_kern simple_ops: func ptr exit is set to prog simple_exit from data(+256) to kern_data(+256) libbpf: struct_ops init_kern simple_ops: copy dispatch_max_batch 4 bytes from data(+264) to kern_data(+264) libbpf: struct_ops init_kern simple_ops: copy flags 8 bytes from data(+272) to kern_data(+272) libbpf: struct_ops init_kern simple_ops: copy timeout_ms 4 bytes from data(+280) to kern_data(+280) libbpf: struct_ops init_kern simple_ops: copy exit_dump_len 4 bytes from data(+284) to kern_data(+284) libbpf: struct_ops init_kern simple_ops: copy hotplug_seq 8 bytes from data(+288) to kern_data(+288) libbpf: struct_ops init_kern simple_ops: copy name 128 bytes from data(+296) to kern_data(+296) libbpf: sec 'struct_ops/simple_enqueue': found 1 CO-RE relocations libbpf: CO-RE relocating [21] struct task_struct: found target candidate [127] struct task_struct in [vmlinux] libbpf: prog 'simple_enqueue': relo #0:I'm using BORE-sched-ext, and sched_ext seems to be working:
Do I need to enable a specific BPF feature or is it an incompatibility with scx and cachyos patches?
Kernel config