sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
844 stars 79 forks source link

Schedulers fail to load: program of this type cannot use helper bpf_probe_read_str #317

Closed aruhier closed 4 months ago

aruhier commented 4 months ago

Hi, Using CachyOS kernel (6.9.1) on Gentoo with libbpf 1.4.2, I can't load any scheduler of the release 1.9 or on main (c09bc2ac699c6bfdc7a9e3af976e509bb0326b69).

Logs of scx_simple -v ``` $ sudo ./scx_simple -v libbpf: object 'scx_simple': failed (-95) to create BPF token from '/sys/fs/bpf', skipping optional step... libbpf: loaded kernel BTF from '/sys/kernel/btf/vmlinux' libbpf: extern (func ksym) 'scx_bpf_consume': resolved to vmlinux [52985] libbpf: extern (func ksym) 'scx_bpf_create_dsq': resolved to vmlinux [52992] libbpf: extern (func ksym) 'scx_bpf_dispatch': resolved to vmlinux [52995] libbpf: extern (func ksym) 'scx_bpf_dispatch_vtime': resolved to vmlinux [52999] libbpf: extern (func ksym) 'scx_bpf_select_cpu_dfl': resolved to vmlinux [53024] libbpf: extern 'scx_bpf_switch_all' (weak): not resolved, defaulting to zero libbpf: struct_ops init_kern simple_ops: type_id:419 kern_type_id:50305 kern_vtype_id:50386 libbpf: struct_ops init_kern simple_ops: func ptr select_cpu is set to prog simple_select_cpu from data(+0) to kern_data(+0) libbpf: struct_ops init_kern simple_ops: func ptr enqueue is set to prog simple_enqueue from data(+8) to kern_data(+8) libbpf: struct_ops init_kern simple_ops: func ptr dispatch is set to prog simple_dispatch from data(+24) to kern_data(+24) libbpf: struct_ops init_kern simple_ops: func ptr running is set to prog simple_running from data(+48) to kern_data(+48) libbpf: struct_ops init_kern simple_ops: func ptr stopping is set to prog simple_stopping from data(+56) to kern_data(+56) libbpf: struct_ops init_kern simple_ops: func ptr enable is set to prog simple_enable from data(+144) to kern_data(+144) libbpf: struct_ops init_kern simple_ops: func ptr init is set to prog simple_init from data(+248) to kern_data(+248) libbpf: struct_ops init_kern simple_ops: func ptr exit is set to prog simple_exit from data(+256) to kern_data(+256) libbpf: struct_ops init_kern simple_ops: copy dispatch_max_batch 4 bytes from data(+264) to kern_data(+264) libbpf: struct_ops init_kern simple_ops: copy flags 8 bytes from data(+272) to kern_data(+272) libbpf: struct_ops init_kern simple_ops: copy timeout_ms 4 bytes from data(+280) to kern_data(+280) libbpf: struct_ops init_kern simple_ops: copy exit_dump_len 4 bytes from data(+284) to kern_data(+284) libbpf: struct_ops init_kern simple_ops: copy hotplug_seq 8 bytes from data(+288) to kern_data(+288) libbpf: struct_ops init_kern simple_ops: copy name 128 bytes from data(+296) to kern_data(+296) libbpf: sec 'struct_ops/simple_enqueue': found 1 CO-RE relocations libbpf: CO-RE relocating [21] struct task_struct: found target candidate [127] struct task_struct in [vmlinux] libbpf: prog 'simple_enqueue': relo #0: [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848) libbpf: prog 'simple_enqueue': relo #0: matching candidate #0 [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920) libbpf: prog 'simple_enqueue': relo #0: patched insn #23 (LDX/ST/STX) off 848 -> 920 libbpf: sec 'struct_ops/simple_running': found 2 CO-RE relocations libbpf: prog 'simple_running': relo #0: [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848) libbpf: prog 'simple_running': relo #0: matching candidate #0 [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920) libbpf: prog 'simple_running': relo #0: patched insn #8 (LDX/ST/STX) off 848 -> 920 libbpf: prog 'simple_running': relo #1: [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848) libbpf: prog 'simple_running': relo #1: matching candidate #0 [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920) libbpf: prog 'simple_running': relo #1: patched insn #11 (LDX/ST/STX) off 848 -> 920 libbpf: sec 'struct_ops/simple_stopping': found 4 CO-RE relocations libbpf: prog 'simple_stopping': relo #0: [21] struct task_struct.scx.slice (0:23:14 @ offset 840) libbpf: prog 'simple_stopping': relo #0: matching candidate #0 [127] struct task_struct.scx.slice (0:23:15 @ offset 912) libbpf: prog 'simple_stopping': relo #0: patched insn #5 (LDX/ST/STX) off 840 -> 912 libbpf: prog 'simple_stopping': relo #1: [21] struct task_struct.scx.weight (0:23:4 @ offset 756) libbpf: prog 'simple_stopping': relo #1: matching candidate #0 [127] struct task_struct.scx.weight (0:23:4 @ offset 820) libbpf: prog 'simple_stopping': relo #1: patched insn #8 (LDX/ST/STX) off 756 -> 820 libbpf: prog 'simple_stopping': relo #2: [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848) libbpf: prog 'simple_stopping': relo #2: matching candidate #0 [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920) libbpf: prog 'simple_stopping': relo #2: patched insn #11 (LDX/ST/STX) off 848 -> 920 libbpf: prog 'simple_stopping': relo #3: [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848) libbpf: prog 'simple_stopping': relo #3: matching candidate #0 [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920) libbpf: prog 'simple_stopping': relo #3: patched insn #13 (LDX/ST/STX) off 848 -> 920 libbpf: sec 'struct_ops/simple_enable': found 1 CO-RE relocations libbpf: prog 'simple_enable': relo #0: [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848) libbpf: prog 'simple_enable': relo #0: matching candidate #0 [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920) libbpf: prog 'simple_enable': relo #0: patched insn #4 (LDX/ST/STX) off 848 -> 920 libbpf: sec 'struct_ops.s/simple_init': found 1 CO-RE relocations libbpf: CO-RE relocating [407] enum scx_ops_flags: found target candidate [50297] enum scx_ops_flags in [vmlinux] libbpf: prog 'simple_init': relo #0: [407] enum scx_ops_flags::SCX_OPS_SWITCH_PARTIAL = 8 libbpf: prog 'simple_init': relo #0: matching candidate #0 [50297] enum scx_ops_flags::SCX_OPS_SWITCH_PARTIAL = 8 libbpf: prog 'simple_init': relo #0: patched insn #0 (LDIMM64) imm64 1 -> 1 libbpf: sec 'struct_ops/simple_exit': found 6 CO-RE relocations libbpf: CO-RE relocating [413] struct scx_exit_info: found target candidate [50296] struct scx_exit_info in [vmlinux] libbpf: prog 'simple_exit': relo #0: [413] struct scx_exit_info.reason (0:2 @ offset 16) libbpf: prog 'simple_exit': relo #0: matching candidate #0 [50296] struct scx_exit_info.reason (0:2 @ offset 16) libbpf: prog 'simple_exit': relo #0: patched insn #1 (LDX/ST/STX) off 16 -> 16 libbpf: prog 'simple_exit': relo #1: [413] struct scx_exit_info.msg (0:5 @ offset 40) libbpf: prog 'simple_exit': relo #1: matching candidate #0 [50296] struct scx_exit_info.msg (0:5 @ offset 40) libbpf: prog 'simple_exit': relo #1: patched insn #12 (LDX/ST/STX) off 40 -> 40 libbpf: prog 'simple_exit': relo #2: [413] struct scx_exit_info.dump (0:6 @ offset 48) libbpf: prog 'simple_exit': relo #2: matching candidate #0 [50296] struct scx_exit_info.dump (0:6 @ offset 48) libbpf: prog 'simple_exit': relo #2: patched insn #18 (LDX/ST/STX) off 48 -> 48 libbpf: prog 'simple_exit': relo #3: [413] struct scx_exit_info.exit_code (0:1 @ offset 8) libbpf: prog 'simple_exit': relo #3: matching candidate #0 [50296] struct scx_exit_info.exit_code (0:1 @ offset 8) libbpf: prog 'simple_exit': relo #3: patched insn #22 (ALU/ALU64) imm 1 -> 1 libbpf: prog 'simple_exit': relo #4: [413] struct scx_exit_info.exit_code (0:1 @ offset 8) libbpf: prog 'simple_exit': relo #4: matching candidate #0 [50296] struct scx_exit_info.exit_code (0:1 @ offset 8) libbpf: prog 'simple_exit': relo #4: patched insn #24 (LDX/ST/STX) off 8 -> 8 libbpf: prog 'simple_exit': relo #5: [413] struct scx_exit_info.kind (0:0 @ offset 0) libbpf: prog 'simple_exit': relo #5: matching candidate #0 [50296] struct scx_exit_info.kind (0:0 @ offset 0) libbpf: prog 'simple_exit': relo #5: patched insn #26 (LDX/ST/STX) off 0 -> 0 libbpf: prog 'simple_init': relo #1: poisoning insn #3 that calls kfunc 'scx_bpf_switch_all' libbpf: map 'stats': created successfully, fd=3 libbpf: map 'scx_simp.rodata': created successfully, fd=4 libbpf: map '.data.uei_dump': created successfully, fd=5 libbpf: map 'scx_simp.data': created successfully, fd=6 libbpf: map 'scx_simp.bss': created successfully, fd=7 libbpf: map 'simple_ops': created successfully, fd=8 libbpf: prog 'simple_exit': BPF program load failed: Invalid argument libbpf: prog 'simple_exit': -- BEGIN PROG LOAD LOG -- Global function simple_exit() doesn't return scalar. Only those are supported. 0: R1=ctx() R10=fp0 ; void BPF_STRUCT_OPS(simple_exit, struct scx_exit_info *ei) @ scx_simple.bpf.c:143 0: (79) r6 = *(u64 *)(r1 +0) func 'exit' arg0 has btf_id 50296 type STRUCT 'scx_exit_info' 1: R1=ctx() R6_w=trusted_ptr_scx_exit_info() ; UEI_RECORD(uei, ei); @ scx_simple.bpf.c:145 1: (79) r3 = *(u64 *)(r6 +16) ; R3_w=scalar() R6_w=trusted_ptr_scx_exit_info() 2: (18) r7 = 0xffffabe20501f000 ; R7_w=map_value(map=scx_simp.data,ks=4,vs=1168) 4: (18) r1 = 0xffffabe20501f000 ; R1_w=map_value(map=scx_simp.data,ks=4,vs=1168) 6: (07) r1 += 16 ; R1_w=map_value(map=scx_simp.data,ks=4,vs=1168,off=16) 7: (b4) w2 = 128 ; R2_w=128 8: (85) call bpf_probe_read_str#45 program of this type cannot use helper bpf_probe_read_str#45 processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'simple_exit': failed to load: -22 libbpf: failed to load object 'scx_simple' libbpf: failed to load BPF skeleton 'scx_simple': -22 ../scheds/c/scx_simple.c:88 [scx panic]: Invalid argument Failed to load skel ```

I'm using BORE-sched-ext, and sched_ext seems to be working:

$ cat /sys/kernel/sched_ext/state
disabled

Do I need to enable a specific BPF feature or is it an incompatibility with scx and cachyos patches?

Kernel config

htejun commented 4 months ago

The only reason I can think of is lacking CAP_PERFMON but I have no idea why that would be when it's run with sudo. What does CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]} say if you run it as root?

aruhier commented 4 months ago

Thanks!

$ CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]}
0x00000000000000ca=cap_dac_override,cap_fowner,cap_setgid,cap_setuid
htejun commented 4 months ago

I don't know why but you don't have enough CAPs to load SCX schedulers. You'd need at least cap_bpf and cap_perfmon.

# CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]}
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
aruhier commented 4 months ago

Hmm, that is indeed weird.

I tried to manually add the capabilities to scx_simple and I have the same issue:

$ getcap ./scx_simple 
./scx_simple cap_perfmon,cap_bpf=ep
aruhier commented 4 months ago

I don't know why but you don't have enough CAPs to load SCX schedulers. You'd need at least cap_bpf and cap_perfmon.

# CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]}
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore

Sorry, from zsh it only shows a few capabilities, but checking them from bash looks ok:

$ CAP=($(grep CapEff /proc/self/status)); capsh --decode=${CAP[1]}
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore

The schedulers still fail to load with the same error, ran from zsh, bash or the systemd service.

ptr1337 commented 4 months ago

Pointed here the kernel config difference out: https://github.com/CachyOS/linux-cachyos/issues/254#issuecomment-2137389234

I do not think, this is an issue from the patchset itself, since it works on several users at CachyOS and there was not any report so far about not working scx_scheduler. You should check your kernel configs configuration and also your system configuration.

aruhier commented 4 months ago

Indeed, I was missing CONFIG_FTRACE=y, that is a dependency for CONFIG_BPF_LSM=y. My config had CONFIG_BPF_LSM=y but CONFIG_FTRACE=n disabled it during compilation.

@htejun: in order to help people with custom config and avoid that kind of report, can you add a section in the README (or I can do a PR for it) specifying that the kernel must be compiled with CONFIG_BPF=y, CONFIG_BPF_LSM=y and CONFIG_BPF_SYSCALL=y?

Thanks!

htejun commented 4 months ago

@htejun: in order to help people with custom config and avoid that kind of report, can you add a section in the README (or I can do a PR for it) specifying that the kernel must be compiled with CONFIG_BPF=y, CONFIG_BPF_LSM=y and CONFIG_BPF_SYSCALL=y?

Yes, please submit a PR. thanks!