Closed abrehman94 closed 3 weeks ago
I think I've seen these types of verifier errors before. The verifier doesn't always track bpf_cpumask
s that well, especially with goto
. You might try adding a check like this before the verifier error:
if (!a_cpumask || !o_cpumask || !t_cpumask || !t2_cpumask) {
cpu_id = -ENOENT;
goto unlock_out;
}
@abrehman94 -- Thank you for your patience. I fixed one verifier error with this. Could you check if the problem still exists?
Thanks @multics69 for the commit. It has fixed the previous problem but there is a new problem now. See the verfier log below. I will try to debug it.
; scx_bpf_error("cpu_ctx lookup failed for current cpu"); @ util.bpf.c:178
17: (7b) *(u64 *)(r10 -16) = r7 ; R7=0 R10=fp0 fp-16_w=0
18: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
19: (07) r2 += -16 ; R2_w=fp-16
20: (18) r1 = 0xffffb2557652ecda ; R1_w=map_value(map=bpf_bpf.rodata,ks=4,vs=4585,off=3290)
22: (b4) w3 = 8 ; R3_w=8
23: (85) call scx_bpf_error_bstr#112230
write into map forbidden, value_size=4585 off=3290 size=1
I saw this error before. Could you try with the latest version? There was a problem in a couple of days ago.
I got a similar error with the latest version too. Could you check this please?
441: (05) goto pc+38
; bpf_cpumask_and(t_cpumask, cast_mask(a_cpumask), cast_mask(little)); @ main.bpf.c:810
480: (79) r6 = *(u64 *)(r10 -128) ; frame1: R6_w=rcu_ptr_bpf_cpumask() R10=fp0 fp-128=rcu_ptr_bpf_cpumask()
; bpf_cpumask_and(t2_cpumask, cast_mask(t_cpumask), cast_mask(cpdom_mask_prev)); @ main.bpf.c:836
481: (bf) r1 = r6 ; frame1: R1_w=rcu_ptr_bpf_cpumask() R6_w=rcu_ptr_bpf_cpumask()
482: (79) r2 = *(u64 *)(r10 -120) ; frame1: R2_w=rcu_ptr_bpf_cpumask() R10=fp0 fp-120=rcu_ptr_bpf_cpumask()
483: (79) r3 = *(u64 *)(r10 -88) ; frame1: R3_w=map_value(map=.data.LAVD,ks=4,vs=1360,off=40,smin=smin32=0,smax=umax=smax32=umax32=1280,var_off=(0x0; 0x7f8)) R10=fp0 fp-88=map_value(map=.data.LAVD,ks=4,vs=1360,off=40,smin=smin32=0,smax=umax=smax32=umax32=1280,var_off=(0x0; 0x7f8))
484: (85) call bpf_cpumask_and#65550
invalid access to map value, value_size=1360 off=1320 size=1024
R3 max value is outside of the allowed memory range
processed 478 insns (limit 1000000) max_states_per_insn 1 total_states 42 peak_states 42 mark_read 17
-- END PROG LOAD LOG --
libbpf: prog 'lavd_select_cpu': failed to load: -13
libbpf: failed to load object 'bpf_bpf'
libbpf: failed to load BPF skeleton 'bpf_bpf': -13
Error: Failed to load BPF program
Caused by:
Permission denied (os error 13)
Sched Config
Opts {
autopilot: true,
autopower: false,
performance: false,
powersave: false,
balanced: false,
no_core_compaction: false,
prefer_smt_core: false,
prefer_little_core: false,
no_prefer_turbo_core: false,
no_freq_scaling: false,
stats: None,
monitor: None,
monitor_sched_samples: None,
verbose: 1,
version: false,
help_stats: false,
}
Setup:
Linux mars 6.12.0-rc3-sched-ext #1 SMP PREEMPT_DYNAMIC Fri Oct 25 06:55:15 KST 2024 x86_64 x86_64 x86_64 GNU/Linux
@ChangHoon-Sung Hmm... I cannot reproduce the problem. What distro did you use? Did you try the latest latest version too?
@multics69 Yes, I re-cloned the whole scx again (83b5f4e) and build with CC=clang-18 environment variable but got the same error. Plus, I got a bunch of warnings when I ran CC=clang-18 ./meson/meson.py compile -C build
but build had been completed without critical error.
Here's more information about the environment:
Distro: Ubuntu 22.04.5 Rustc: 1.82.0 Clang: 18 Meson: 1.6.99 (cloned the repo)
❯ CC=clang-18 ./meson/meson.py setup build --prefix `pwd` --wipe
The Meson build system
Version: 1.6.99
Source dir: /home/hoon/workspace/ros-sched/scx
Build dir: /home/hoon/workspace/ros-sched/scx/build
Build type: native build
Project name: sched_ext schedulers
Project version: 1.0.5
C compiler for the host machine: clang-18 (clang 18.1.8 "Ubuntu clang version 18.1.8 (++20240731024944+3b5b5c1ec4a3-1~exp1~20240731145000.144)")
C linker for the host machine: clang-18 ld.bfd 2.38
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program clang-18 found: YES (/usr/bin/clang-18)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/veristat found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/veristat)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/veristat_diff found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/veristat_diff)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/run_stress_tests found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/run_stress_tests)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/get_clang_ver found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/get_clang_ver)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/get_bpftool_ver found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/get_bpftool_ver)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/bpftool_build_skel found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/bpftool_build_skel)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/get_sys_incls found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/get_sys_incls)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/test_sched found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/test_sched)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/fetch_libbpf found: YES (/bin/bash /home/hoon/workspace/ros-sched/scx/meson-scripts/fetch_libbpf)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/build_libbpf found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/build_libbpf)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/fetch_bpftool found: YES (/bin/bash /home/hoon/workspace/ros-sched/scx/meson-scripts/fetch_bpftool)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/build_bpftool found: YES (/bin/bash /home/hoon/workspace/ros-sched/scx/meson-scripts/build_bpftool)
Program jq found: YES (/usr/bin/jq)
Program make found: YES (/usr/bin/make)
Program nproc found: YES (/usr/bin/nproc)
Message: Fetching libbpf repo
Library elf found: YES
Library z found: YES
Library zstd found: YES
Message: Fetching bpftool repo
Message: cpu=x86_64 bpf_base_cflags=['-g', '-O2', '-Wall', '-Wno-compare-distinct-pointer-types', '-D__TARGET_ARCH_x86', '-mcpu=v3', '-mlittle-endian', '-idirafter /usr/lib/llvm-18/lib/clang/18/include', '-idirafter /usr/local/include', '-idirafter /usr/include/x86_64-linux-gnu', '-idirafter /usr/include']
Program cargo found: YES (/home/hoon/.cargo/bin/cargo)
Program /home/hoon/workspace/ros-sched/scx/meson-scripts/cargo_fetch found: YES (/home/hoon/workspace/ros-sched/scx/meson-scripts/cargo_fetch)
Run-time dependency threads found: YES
Dependency threads found: YES unknown (cached)
Dependency threads found: YES unknown (cached)
Dependency threads found: YES unknown (cached)
Dependency threads found: YES unknown (cached)
Dependency threads found: YES unknown (cached)
Dependency threads found: YES unknown (cached)
Found pkg-config: YES (/usr/bin/pkg-config) 0.29.2
Run-time dependency systemd found: YES 249
Found CMake: /usr/bin/cmake (3.22.1)
Run-time dependency openrc found: NO (tried pkgconfig and cmake)
Run-time dependency libalpm found: NO (tried pkgconfig and cmake)
Build targets in project: 52
sched_ext schedulers 1.0.5
User defined options
prefix: /home/hoon/workspace/ros-sched/scx
Found ninja-1.10.1 at /usr/bin/ninja
What's weird is that scx_lavd installed with cargo install scx_lavd
works fine. Do you have any idea where the problem is?
The problem is this commit(https://github.com/sched-ext/scx/commit/1b5359ef4aa6cf7d642749850128ab901d76510a). When I reverted it then it works for me. dont have a error. And from my simple check, it seems this function(https://github.com/sched-ext/scx/blame/4c3f1fd61c46b6dcda3ef29b792ccbb50674f998/scheds/include/scx/common.bpf.h#L207) is not working well now in other arch(my case x86). probably maintainer's arch is not a x86.
Hmm... it is weird. Why I am not able to reproduce the problem? I will further take a look given the collected logs and update here. I tested on x86 and ARM64 with Debian and CachyOS.
@ChangHoon-Sung -- What kernel version did you use?
@multics69 I tried 6.12-rc3, and Oct 31, 2024 version of bpf-next. The error looks similar to the CI fail of github action. Only lavd is having the problem.
cpumask definition in the vmlinux file for x86 has only 4 bits which was causing this issue. Updating it to 128 bits solves this issue.
Pull request: https://github.com/sched-ext/scx/pull/889
Verifier Log
Sched Config
Setup:
Linux v-021 6.11.0-rc1-scx1+ #6 SMP PREEMPT_DYNAMIC Mon Oct 14 12:34:19 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux