Closed 0x006E closed 20 hours ago
@0x006E thanks for reporting this, it'd be really helpful if you could grab the stdout of scx_rustland and report the trace when the error happens, unfortunately dmesg doesn't say much.
Also keep in mind that scx_rustland at the moment should be considered mostly a proof-of-concept / experimental scheduler and you should use scx_bpfland for better performance.
But it's still interesting to investigate and understand the reason of these stalls. of course :)
I'll add the trace too. Is there any other env needed for backtrace or anything like that to add anymore details?
The trace shown to stdout should be enough for now to better understand what's happening, thanks!
I have not been able to get traceout, I waited for about 30 minutes or so doing my work, but it never crashed but the system hang like crazy, even sysrq didn;t work. The stdout does not look interesting, I'll add it here.
I am adding the logs when the system fully freezed: rustland.log
I'll try after a reboot to check if it crashes.
Edit: I didn't know Ctrl + \ is used to trigger a core dump
I'm a bit confused here. When you say "the system hang" you mean that it's completely unresponsive? IIUC you can stop scx_rustland, and when you do so the system immediately recovers and it's responsive again?
With scx_rustland running if a task isn't scheduled for 5sec the sched_ext watchdog should automatically kick out the scheduler and restore the default linux scheduler (and in this case you should see scx_rustland spitting a trace). But apparently you don't see this, that's why I'm confused.
Sorry about that, the system was unresponsive, and I couldn't close or do anything for about 3 seconds. Anyway I finally got a crash. The crash happened when I was just browsing the web. This is the trace.
03:33:10 [INFO] cpus=16 min_vruntime=10580391987985 slice=5000us
03:33:10 [INFO] tasks=438
03:33:10 [INFO] nr_user_dispatches=9535661 nr_kernel_dispatches=9788542
03:33:10 [INFO] nr_cancel_dispatches=0 nr_bounce_dispatches=0
03:33:10 [INFO] nr_running=10 nr_waiting=13 [nr_queued=0 + nr_scheduled=13]
03:33:10 [INFO] nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]
03:33:10 [INFO] [pid] [nvcsw] [comm] <-- interactive tasks
03:33:10 [INFO] 18 199 rcu_preempt
03:33:10 [INFO] 5257 19 Isolated
03:33:10 [INFO] 396 18 kworker/u64:7-events_unbound
03:33:10 [INFO] 120 10 kworker/u64:2-events_unbound
03:33:10 [INFO] 660 10 irq/169-ELAN0518:01
03:33:10 [INFO] cpus=16 min_vruntime=10580435393350 slice=5000us
03:33:10 [INFO] tasks=0
03:33:10 [INFO] nr_user_dispatches=9535661 nr_kernel_dispatches=9788964
03:33:10 [INFO] nr_cancel_dispatches=0 nr_bounce_dispatches=0
03:33:10 [INFO] nr_running=0 nr_waiting=0 [nr_queued=0 + nr_scheduled=0]
03:33:10 [INFO] nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]
DEBUG DUMP
================================================================================
kworker/u64:2[120] triggered exit kind 1026:
runnable task stall (glean.dispatche[4738] failed to run for 5.076s)
Backtrace:
scx_watchdog_workfn+0x11b/0x1a0
process_scheduled_works+0x206/0x480
worker_thread+0x349/0x5e0
kthread+0x14a/0x170
ret_from_fork+0x40/0x50
ret_from_fork_asm+0x1a/0x30
CPU states
----------
CPU 0 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=1782324 pnt_seq=1476772
curr=swapper/0[0] class=idle_sched_class
R StreamTrans #13[23074] -1192ms
scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=2/1782320
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
smaps_pte_range+0x171/0x570
walk_pgd_range+0x5da/0xa00
__walk_page_range+0x5b/0x1d0
walk_page_vma+0xc9/0x110
show_smap+0x9d/0x280
seq_read_iter+0x319/0x430
seq_read+0x13d/0x180
vfs_read+0xb7/0x320
__x64_sys_read+0x79/0xf0
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 1 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=995151 pnt_seq=841045
curr=swapper/1[0] class=idle_sched_class
cpus_to_kick : 0002
R threaded-ml[7744] -4810ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/995135
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
futex_wait+0xc3/0x160
do_futex+0x158/0x1d0
__se_sys_futex+0x120/0x1e0
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 2 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1382977 pnt_seq=1143372
curr=swapper/2[0] class=idle_sched_class
cpus_to_kick : 0004
R Isolated Web Co[6846] -3932ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1382963
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
get_mem_cgroup_from_mm+0x4/0x100
__mem_cgroup_charge+0x1a/0xb0
do_pte_missing+0x9c5/0xe90
handle_mm_fault+0x997/0x1500
do_user_addr_fault+0x25e/0x750
exc_page_fault+0x6b/0x110
asm_exc_page_fault+0x26/0x30
R TaskCon~ller #1[11548] -2036ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1382971
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
futex_wait+0xc3/0x160
do_futex+0x158/0x1d0
__se_sys_futex+0x120/0x1e0
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 3 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=697702 pnt_seq=580514
curr=swapper/3[0] class=idle_sched_class
cpus_to_kick : 0008
R StreamTrans #39[23076] -962ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/697700
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
seq_putc+0x28/0x40
show_vma_header_prefix+0x8f/0x140
show_map_vma+0x6b/0x1f0
show_smap+0xa8/0x280
seq_read_iter+0x319/0x430
seq_read+0x13d/0x180
vfs_read+0xb7/0x320
__x64_sys_read+0x79/0xf0
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 4 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=1685508 pnt_seq=1385938
curr=swapper/4[0] class=idle_sched_class
cpus_to_kick : 0010
R glean.dispatche[4738] -5076ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1685484
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256924731832
cpus=ffff
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 6 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1113809 pnt_seq=918714
curr=swapper/6[0] class=idle_sched_class
cpus_to_kick : 0040
R WRScene~derLP#1[4856] -4760ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1113792
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256536597618
cpus=ffff
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
R niri[4017] -4388ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1113797
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=257059099749
cpus=ffff
__x64_sys_epoll_wait+0xc8/0x100
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 8 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1634656 pnt_seq=1365933
curr=swapper/8[0] class=idle_sched_class
cpus_to_kick : 0100
R .kgx-wrapped[11619] -5172ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1634640
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
wq_worker_comm+0x82/0x1b0
proc_task_name+0x9d/0x150
proc_pid_status+0x60/0xe10
proc_single_show+0x50/0xb0
seq_read_iter+0x14e/0x430
seq_read+0x13d/0x180
vfs_read+0xb7/0x320
__x64_sys_read+0x79/0xf0
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
R IPC I/O Child[4961] -5108ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1634641
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=251539433448
cpus=ffff
__x64_sys_epoll_wait+0xc8/0x100
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 11 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1802923 pnt_seq=1490758
curr=kworker/u64:2[120] class=ext_sched_class
*R kworker/u64:2[120] +0ms
scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=257059208040
cpus=ffff
scx_dump_state+0x63f/0x840
scx_ops_error_irq_workfn+0x23/0x50
irq_work_run+0xb8/0x150
__sysvec_irq_work+0x1c/0xd0
sysvec_irq_work+0x6e/0x80
asm_sysvec_irq_work+0x1a/0x20
scx_watchdog_workfn+0x137/0x1a0
process_scheduled_works+0x206/0x480
worker_thread+0x349/0x5e0
kthread+0x14a/0x170
ret_from_fork+0x40/0x50
ret_from_fork_asm+0x1a/0x30
R IPC I/O Parent[4740] -3684ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1802910
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=257059212384
cpus=ffff
__x64_sys_epoll_wait+0xc8/0x100
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 14 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=2289871 pnt_seq=1831828
curr=swapper/14[0] class=idle_sched_class
R WRScene~ilder#1[4855] -5010ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/2289849
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256583729520
cpus=ffff
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 15 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1975342 pnt_seq=1644291
curr=swapper/15[0] class=idle_sched_class
cpus_to_kick : 8000
idle_to_kick : 0800
R btrfs-transacti[341] -4092ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1975324
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256826913698
cpus=ffff
btrfs_set_token_32+0x37/0x210 [btrfs]
setup_items_for_insert+0x275/0x460 [btrfs]
btrfs_insert_empty_items+0x5e/0x90 [btrfs]
__btrfs_run_delayed_refs+0x7ef/0x13f0 [btrfs]
btrfs_run_delayed_refs+0x39/0x120 [btrfs]
btrfs_start_dirty_block_groups+0x12f/0x590 [btrfs]
btrfs_commit_transaction+0x23c/0x1040 [btrfs]
transaction_kthread+0x12d/0x1c0 [btrfs]
kthread+0x14a/0x170
ret_from_fork+0x40/0x50
ret_from_fork_asm+0x1a/0x30
R WebExtensions[4824] -3756ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1975330
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256943569144
cpus=ffff
__x64_sys_poll+0xa9/0x140
do_syscall_64+0x91/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
================================================================================
03:33:10 [INFO] Unregister RustLand scheduler
Error: EXIT: runnable task stall (glean.dispatche[4738] failed to run for 5.076s)
Edit 1: Got another crash
04:25:56 [INFO] cpus=16 min_vruntime=40706736907195 slice=5000us
04:25:56 [INFO] tasks=0
04:25:56 [INFO] nr_user_dispatches=14742368 nr_kernel_dispatches=29939067
04:25:56 [INFO] nr_cancel_dispatches=1 nr_bounce_dispatches=0
04:25:56 [INFO] nr_running=0 nr_waiting=2 [nr_queued=0 + nr_scheduled=2]
04:25:56 [INFO] nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]
04:25:56 [INFO] [pid] [nvcsw] [comm] <-- interactive tasks
04:25:56 [INFO] 67841 162 kworker/u64:5+btrfs-endio-write
04:25:56 [INFO] 44019 12 kworker/u64:18-events_unbound
04:25:56 [INFO] cpus=16 min_vruntime=40706736907195 slice=5000us
04:25:56 [INFO] tasks=0
04:25:56 [INFO] nr_user_dispatches=14742368 nr_kernel_dispatches=29939067
04:25:56 [INFO] nr_cancel_dispatches=1 nr_bounce_dispatches=0
04:25:56 [INFO] nr_running=0 nr_waiting=2 [nr_queued=0 + nr_scheduled=2]
04:25:56 [INFO] nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]
DEBUG DUMP
================================================================================
kworker/u64:18[44019] triggered exit kind 1026:
runnable task stall (kworker/u65:2[42574] failed to run for 6.836s)
Backtrace:
scx_watchdog_workfn+0x11b/0x1a0
process_scheduled_works+0x206/0x480
worker_thread+0x349/0x5e0
kthread+0x14a/0x170
ret_from_fork+0x40/0x50
ret_from_fork_asm+0x1a/0x30
CPU states
----------
CPU 4 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=6398653 pnt_seq=5610460
curr=swapper/4[0] class=idle_sched_class
R kworker/u65:2[42574] -6836ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/6398633
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
zram_submit_bio+0x336/0x900 [zram]
__submit_bio+0x159/0x250
submit_bio_noacct_nocheck+0x1f5/0x390
submit_bio_wait+0x5b/0xd0
__swap_writepage+0x2e2/0x750
shrink_memcg_cb+0x24a/0x2d0
__list_lru_walk_one+0x113/0x230
list_lru_walk_one+0x5e/0x90
zswap_shrinker_scan+0xbf/0x150
do_shrink_slab+0x170/0x340
shrink_slab+0x334/0x4b0
shrink_one+0xd1/0x3c0
shrink_node+0xdf6/0x1240
do_try_to_free_pages+0x19b/0x5a0
try_to_free_pages+0x486/0x5d0
__alloc_pages_slowpath+0x6b1/0x1010
__alloc_pages_noprof+0x283/0x370
alloc_pages_mpol_noprof+0x11d/0x1f0
iwl_pcie_rx_alloc_page+0xf0/0x240 [iwlwifi]
iwl_pcie_rx_allocator_work+0x15c/0x370 [iwlwifi]
process_scheduled_works+0x206/0x480
worker_thread+0x349/0x5e0
kthread+0x14a/0x170
ret_from_fork+0x40/0x50
ret_from_fork_asm+0x1a/0x30
CPU 9 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=4715423 pnt_seq=3884049
curr=kworker/u64:18[44019] class=ext_sched_class
*R kworker/u64:18[44019] +0ms
scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
scx_dump_state+0x63f/0x840
scx_ops_error_irq_workfn+0x23/0x50
irq_work_run+0xb8/0x150
__sysvec_irq_work+0x1c/0xd0
sysvec_irq_work+0x6e/0x80
asm_sysvec_irq_work+0x1a/0x20
scx_watchdog_workfn+0x137/0x1a0
process_scheduled_works+0x206/0x480
worker_thread+0x349/0x5e0
kthread+0x14a/0x170
ret_from_fork+0x40/0x50
ret_from_fork_asm+0x1a/0x30
CPU 11 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=4673484 pnt_seq=3760085
curr=swapper/11[0] class=idle_sched_class
cpus_to_kick : 0800
idle_to_kick : 0040
R kswapd0[138] -6856ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/4673444
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
try_to_wake_up+0x334/0x610
btrfs_free_extent_maps+0x4e7/0x6e0 [btrfs]
super_cache_scan+0x1d5/0x210
do_shrink_slab+0x170/0x340
shrink_slab+0x334/0x4b0
shrink_one+0xd1/0x3c0
shrink_node+0xdf6/0x1240
kswapd+0xd80/0x1390
kthread+0x14a/0x170
ret_from_fork+0x40/0x50
ret_from_fork_asm+0x1a/0x30
CPU 14 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=6019245 pnt_seq=4862403
curr=swapper/14[0] class=idle_sched_class
cpus_to_kick : 4000
R kworker/u64:7[69892] -2186ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/6019242
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
cpus=ffff
get_page_from_freelist+0x13c9/0x14b0
__alloc_pages_slowpath+0x627/0x1010
__alloc_pages_noprof+0x283/0x370
allocate_slab+0xa7/0x740
___slab_alloc+0x47b/0xa20
kmem_cache_alloc_noprof+0x1ca/0x2d0
zs_malloc+0x42/0x2d0
zs_zpool_malloc+0x12/0x30
zswap_compress+0x1a1/0x260
zswap_store+0x43a/0x760
swap_writepage+0x41/0xe0
shrink_folio_list+0xacd/0x1060
evict_folios+0x1a6a/0x2080
try_to_shrink_lruvec+0x29d/0x470
shrink_one+0xb6/0x3c0
shrink_node+0xdf6/0x1240
do_try_to_free_pages+0x19b/0x5a0
try_to_free_pages+0x486/0x5d0
__alloc_pages_slowpath+0x6b1/0x1010
__alloc_pages_noprof+0x283/0x370
alloc_pages_bulk_noprof+0x337/0x7b0
alloc_eb_folio_array+0x10a/0x1e0 [btrfs]
alloc_extent_buffer+0x14b/0x7f0 [btrfs]
btrfs_alloc_tree_block+0x10a/0x520 [btrfs]
btrfs_force_cow_block+0x15c/0x7d0 [btrfs]
btrfs_cow_block+0xe1/0x260 [btrfs]
btrfs_search_slot+0x36d/0xa00 [btrfs]
btrfs_mark_extent_written+0x11a/0xf00 [btrfs]
btrfs_finish_one_ordered+0x358/0x8e0 [btrfs]
btrfs_work_helper+0xd0/0x420 [btrfs]
process_scheduled_works+0x206/0x480
worker_thread+0x349/0x5e0
kthread+0x14a/0x170
ret_from_fork+0x40/0x50
ret_from_fork_asm+0x1a/0x30
================================================================================
04:25:56 [INFO] Unregister RustLand scheduler
Error: EXIT: runnable task stall (kworker/u65:2[42574] failed to run for 6.836s)
Thanks that's really useful.
I see lots of tasks in the traces that are not dispatched (meaning that they are probably sitting in the user-space scheduler's queue), I'm wondering if the user-space scheduler is blocked by something and it's unable to dispatch the tasks. Like in the second trace it looks like the system is trying to reclaim memory. When this happens if the user-space scheduler hits a page fault, we may have a deadlock. This should be prevented by setting /proc/sys/vm/compact_unevictable_allowed
when the scheduler runs (that should be done automatically by scx_rustland), can you check if that's the case?
When this problem happens is your system under memory pressure conditions (almost out of free memory / free -m
)?
I switched to bpfland, but it too does freeze the system sometimes, but has not crash yet.
~ ❯ cat /proc/sys/vm/compact_unevictable_allowed
0
shows this. Most of the time, I do have many apps on and RAM will be probably under 1GB available. But there is a general system lag like random freezing when switching apps, or spawning new processes. I'll try to check if occurs during a memory stress test.
If you can give me something to test, I can do that too.
Interesting that you're experiencing system freezes also with bpfland, does it happen with other sched-ext schedulers as well? like scx_lavd, scx_rusty?
I haven't tried others, I can try and report if these exist in those schedulers too. A memory stress does not make the scheduler crash. Only makes the system unresponsive. I'm also using zram if that's relevant too.
https://github.com/user-attachments/assets/628a80c1-4fcd-44f3-8bb0-152a697acde0
I think the memory pressure condition is slowing down the user-space scheduler (somehow). That's probably why the system becomes unresponsive. But this shouldn't happen with other schedulers that don't have a relevant user-space counterpart, such as bpfland. I'll try to reproduce this on my side. Thanks!
I was wrong about the bpfland scheduler. It doesn't make the system unresponsive even on low memory conditions. But bpfland crashes sometimes (dmesg still shows scheduler stall). I cannot capture a trace because the crash happens randomly and is very infrequent unlike rustland.
Ok thanks for the update, makes sense, let me know if you can catch a trace with bpfland and I'll take a look.
On the rustland side I'll do some tests locally under memory pressure conditions.
I have the same problems, under high loads scx_rusty on musl fails, this happened during compilation, as can be seen from llvm-objcopy, but I’m not sure that this is one of the most difficult stages, in theory, before the single-threaded lld stage there was a multi-threaded one, clogging all processor cores, which often takes a very long time I can assume that the scheduler in this case simply queues a request to receive data about the process, but due to long freezes, the task is completed before rusty can find out information about it
DEBUG DUMP
================================================================================
pahole[11409] triggered exit kind 1025:
scx_bpf_error (task_ctx lookup failed for pid 11415)
Backtrace:
scx_bpf_error_bstr+0x65/0x80
bpf_prog_33340da880198fec_rusty_select_cpu+0x359/0xcd5
CPU states
----------
CPU 0 : nr_run=3 flags=0x1 cpu_rel=0 ops_qseq=38568750 pnt_seq=10733736
curr=pahole[10859] class=ext_sched_class
*R pahole[10859] -17ms
scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272254783601
cpus=ffffff
R pahole[11396] -15ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=831272272285269
cpus=ffffff
do_exit+0x208/0xa20
__x64_sys_exit+0x17/0x20
x64_sys_call+0x216b/0x2190
do_syscall_64+0x8e/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
R kworker/0:0[14322] -7ms
scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=831272252642544
cpus=000001
kthread+0x165/0x170
ret_from_fork+0x3a/0x50
ret_from_fork_asm+0x1a/0x30
CPU 2 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=28800786 pnt_seq=9488776
curr=pahole[10858] class=ext_sched_class
*R pahole[10858] -14ms
scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272078888049
cpus=ffffff
CPU 6 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=38443503 pnt_seq=10900631
curr=llvm-objcopy[11439] class=ext_sched_class
*R llvm-objcopy[11439] -6ms
scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=690011920160776
cpus=ffffff
R kworker/6:2[4218] -2ms
scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x2 dsq_vtime=690011900656067
cpus=000040
kthread+0x165/0x170
ret_from_fork+0x3a/0x50
ret_from_fork_asm+0x1a/0x30
CPU 7 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=31282510 pnt_seq=10433864
curr=pahole[10869] class=ext_sched_class
*R pahole[10869] +0ms
scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272289461845
cpus=ffffff
CPU 9 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=38009225 pnt_seq=10832640
curr=pahole[11284] class=ext_sched_class
*R pahole[11284] -16ms
scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=694478940670545
cpus=ffffff
R pahole[11424] +0ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478960037022
cpus=ffffff
do_exit+0x208/0xa20
__x64_sys_exit+0x17/0x20
x64_sys_call+0x216b/0x2190
do_syscall_64+0x8e/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 10 : nr_run=6 flags=0x1 cpu_rel=0 ops_qseq=31336007 pnt_seq=10425509
curr=llvm-objcopy[11428] class=ext_sched_class
*R llvm-objcopy[11428] -12ms
scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=694478960028236
cpus=ffffff
R kworker/10:2[15834] -9ms
scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x3 dsq_vtime=694478940227902
cpus=000400
kthread+0x165/0x170
ret_from_fork+0x3a/0x50
ret_from_fork_asm+0x1a/0x30
R pahole[11438] -7ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478960055368
cpus=ffffff
do_exit+0x208/0xa20
__x64_sys_exit+0x17/0x20
x64_sys_call+0x216b/0x2190
do_syscall_64+0x8e/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
R pahole[11346] +0ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478957571866
cpus=ffffff
do_exit+0x208/0xa20
__x64_sys_exit+0x17/0x20
x64_sys_call+0x216b/0x2190
do_syscall_64+0x8e/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
R pahole[11403] +0ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478959580175
cpus=ffffff
do_exit+0x208/0xa20
__x64_sys_exit+0x17/0x20
x64_sys_call+0x216b/0x2190
do_syscall_64+0x8e/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
R pahole[11415] +0ms
scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478959808035
cpus=ffffff
do_exit+0x208/0xa20
__x64_sys_exit+0x17/0x20
x64_sys_call+0x216b/0x2190
do_syscall_64+0x8e/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 11 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=29224391 pnt_seq=10063185
curr=pahole[11300] class=ext_sched_class
*R pahole[11300] -15ms
scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=694478942555277
cpus=ffffff
R kworker/11:0[1561] -13ms
scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x3 dsq_vtime=694478939681337
cpus=000800
kthread+0x165/0x170
ret_from_fork+0x3a/0x50
ret_from_fork_asm+0x1a/0x30
CPU 14 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=28689863 pnt_seq=9430221
curr=pahole[10871] class=ext_sched_class
*R pahole[10871] -16ms
scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272254067677
cpus=ffffff
CPU 15 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=39940361 pnt_seq=10579499
curr=pahole[10860] class=ext_sched_class
*R pahole[10860] -19ms
scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=674247303750124
cpus=ffffff
CPU 18 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=38536674 pnt_seq=10987690
curr=ld.lld[11442] class=ext_sched_class
*R ld.lld[11442] -1ms
scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=690011920586053
cpus=ffffff
CPU 19 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=31239404 pnt_seq=10432656
curr=pahole[11315] class=ext_sched_class
*R pahole[11315] -11ms
scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=690011915451870
cpus=ffffff
R kworker/19:0[12400] -9ms
scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x2 dsq_vtime=690011900326430
cpus=080000
kthread+0x165/0x170
ret_from_fork+0x3a/0x50
ret_from_fork_asm+0x1a/0x30
CPU 21 : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=37913262 pnt_seq=10715091
curr=pahole[10863] class=ext_sched_class
*R pahole[10863] -18ms
scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272255238321
cpus=ffffff
R kworker/21:0[3061] -17ms
scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x3 dsq_vtime=694478938370490
cpus=200000
kthread+0x165/0x170
ret_from_fork+0x3a/0x50
ret_from_fork_asm+0x1a/0x30
CPU 22 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=31210320 pnt_seq=10360926
curr=pahole[11409] class=ext_sched_class
*R pahole[11409] +0ms
scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=694478960296092
cpus=ffffff
scx_dump_state+0x617/0x800
scx_ops_error_irq_workfn+0x23/0x50
irq_work_run+0xa3/0x120
__sysvec_irq_work+0x1c/0xc0
sysvec_irq_work+0x69/0x80
asm_sysvec_irq_work+0x1a/0x20
try_to_wake_up+0x394/0x730
wake_up_q+0x44/0x80
rwsem_wake+0x6b/0xa0
up_write+0x3c/0x50
vm_mmap_pgoff+0x100/0x1b0
do_syscall_64+0x8e/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
CPU 23 : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=29084174 pnt_seq=9908140
curr=pahole[10862] class=ext_sched_class
*R pahole[10862] -7ms
scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=674247256733037
cpus=ffffff
================================================================================
Error: EXIT: scx_bpf_error (task_ctx lookup failed for pid 11415)
scx_rusty
failing to lookup task_ctx
is likely due to using pid_t
to key these searches. The pid can change if execing from multithreaded processes which then can lead to these crashes. Planning to fix it soon.
Hi,
I'm on a laptop with i5-1240P. I'm using scx_rustland. I'm frequently having the issue where scx_rustland crashes? Atleast dmesg suggests that. When opening apps, sometimes the whole system just freezes for a second and the CPU is getting super hot. This maybe due to nvidia? This is the conclusion what I arrived at after posting this in the chaotic nix support group.
This is my dmesg: https://bpa.st/2XDQ
This is my corresponding nix config https://github.com/0x006E/dotfiles