sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
691 stars 48 forks source link

scx_layered: Fix exit_task task_ctx lookup failed #398

Open eternalok opened 1 week ago

eternalok commented 1 week ago

When forking one task, init_task will save task_ctx into bpf storage, then bpf_task_storage_get can get it later, refer to [1].

However, if task creation fails before sched_cgroup_fork, so no task_ctx is created. then final run cgroup_cancel_fork, exit_task finds no task_ctx and runs scx_bpf_error("task_ctx lookup failed"), scx_layered exits, refer to [2].

[1] Successfully fork one task copy_process sched_fork sched_cgroup_fork -> init_task sched_post_fork return p

[2] Failed to create task copy_process sched_fork xxx // failed then skip sched_cgroup_fork goto bad_fork_cancel_cgroup: cgroup_cancel_fork -> exit_task

htejun commented 4 days ago

@eternalok Did you actually see this happening? Unless ops.init_task() returns successfully, scx core shouldn't call ops.exit_task(). If you're seeing look-up failure in the exit path, it could be a different bug too.

eternalok commented 4 days ago

@eternalok Did you actually see this happening? Unless ops.init_task() returns successfully, scx core shouldn't call ops.exit_task(). If you're seeing look-up failure in the exit path, it could be a different bug too.

Yes, I see this happening and can reproduce it. The detail log is following:

CPU 35 : nr_run=1 flags=0x0 cpu_rel=0 ops_qseq=6162441 pnt_seq=31851534 curr=pouch[307503] class=ext_sched_class

*R pouch[307503] +0ms scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0 sticky/holding_cpu=-1/-1 dsq_id=(n/a) cpus=ffffffff,ffffffff,ffffffff

scx_ops_error_irq_workfn+0x48a/0x4f0
irq_work_single+0x20/0x60
irq_work_run_list+0x26/0x40
irq_work_run+0x26/0x40
__sysvec_irq_work+0x18/0xc0
sysvec_irq_work+0x9d/0xd0
asm_sysvec_irq_work+0x16/0x20
scx_cancel_fork+0xbb/0x100
copy_process+0xb3f/0x2720
kernel_clone+0x9a/0x3b0
__do_sys_clone+0x66/0x90
do_syscall_64+0x5e/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e

with Error: EXIT: scx_bpf_error (task_ctx lookup failed).

Obviously, ops.init_task is not executed. For example, running perf_event_init_task failed before ops.init_task will run scheduled_cancel_fork. Maybe, the ops.init_task should be run in sched_fork instead of schedule_cgroup_fork ?

htejun commented 3 days ago

Hmm... can you share the repro? I tried with induced errors on perf_event_init_task() and ops.init_task() but neither case calls ops.exit_task() as the task_state remains SCX_TASK_NONE and scx_ops_exit_task() skips calling ops.exit_task() if so. Can you see whether the problem is reproducible with the following kernel?

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-next