sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
887 stars 84 forks source link

sched_ext: Rustland crashing and restarting on a laptop. #490

Closed 0x006E closed 20 hours ago

0x006E commented 2 months ago

Hi,

I'm on a laptop with i5-1240P. I'm using scx_rustland. I'm frequently having the issue where scx_rustland crashes? Atleast dmesg suggests that. When opening apps, sometimes the whole system just freezes for a second and the CPU is getting super hot. This maybe due to nvidia? This is the conclusion what I arrived at after posting this in the chaotic nix support group.

This is my dmesg: https://bpa.st/2XDQ

This is my corresponding nix config https://github.com/0x006E/dotfiles

arighi commented 2 months ago

@0x006E thanks for reporting this, it'd be really helpful if you could grab the stdout of scx_rustland and report the trace when the error happens, unfortunately dmesg doesn't say much.

Also keep in mind that scx_rustland at the moment should be considered mostly a proof-of-concept / experimental scheduler and you should use scx_bpfland for better performance.

But it's still interesting to investigate and understand the reason of these stalls. of course :)

0x006E commented 2 months ago

I'll add the trace too. Is there any other env needed for backtrace or anything like that to add anymore details?

arighi commented 2 months ago

The trace shown to stdout should be enough for now to better understand what's happening, thanks!

0x006E commented 2 months ago

I have not been able to get traceout, I waited for about 30 minutes or so doing my work, but it never crashed but the system hang like crazy, even sysrq didn;t work. The stdout does not look interesting, I'll add it here.

I am adding the logs when the system fully freezed: rustland.log

I'll try after a reboot to check if it crashes.

Edit: I didn't know Ctrl + \ is used to trigger a core dump

arighi commented 2 months ago

I'm a bit confused here. When you say "the system hang" you mean that it's completely unresponsive? IIUC you can stop scx_rustland, and when you do so the system immediately recovers and it's responsive again?

With scx_rustland running if a task isn't scheduled for 5sec the sched_ext watchdog should automatically kick out the scheduler and restore the default linux scheduler (and in this case you should see scx_rustland spitting a trace). But apparently you don't see this, that's why I'm confused.

0x006E commented 2 months ago

Sorry about that, the system was unresponsive, and I couldn't close or do anything for about 3 seconds. Anyway I finally got a crash. The crash happened when I was just browsing the web. This is the trace.

03:33:10 [INFO] cpus=16 min_vruntime=10580391987985 slice=5000us
03:33:10 [INFO]   tasks=438
03:33:10 [INFO]   nr_user_dispatches=9535661 nr_kernel_dispatches=9788542
03:33:10 [INFO]   nr_cancel_dispatches=0 nr_bounce_dispatches=0
03:33:10 [INFO]   nr_running=10 nr_waiting=13 [nr_queued=0 + nr_scheduled=13]
03:33:10 [INFO]   nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]
03:33:10 [INFO] [pid]       [nvcsw] [comm] <-- interactive tasks
03:33:10 [INFO] 18              199 rcu_preempt
03:33:10 [INFO] 5257             19 Isolated
03:33:10 [INFO] 396              18 kworker/u64:7-events_unbound
03:33:10 [INFO] 120              10 kworker/u64:2-events_unbound
03:33:10 [INFO] 660              10 irq/169-ELAN0518:01
03:33:10 [INFO] cpus=16 min_vruntime=10580435393350 slice=5000us
03:33:10 [INFO]   tasks=0
03:33:10 [INFO]   nr_user_dispatches=9535661 nr_kernel_dispatches=9788964
03:33:10 [INFO]   nr_cancel_dispatches=0 nr_bounce_dispatches=0
03:33:10 [INFO]   nr_running=0 nr_waiting=0 [nr_queued=0 + nr_scheduled=0]
03:33:10 [INFO]   nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]

DEBUG DUMP
================================================================================

kworker/u64:2[120] triggered exit kind 1026:
  runnable task stall (glean.dispatche[4738] failed to run for 5.076s)

Backtrace:
  scx_watchdog_workfn+0x11b/0x1a0
  process_scheduled_works+0x206/0x480
  worker_thread+0x349/0x5e0
  kthread+0x14a/0x170
  ret_from_fork+0x40/0x50
  ret_from_fork_asm+0x1a/0x30

CPU states
----------

CPU 0   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=1782324 pnt_seq=1476772
          curr=swapper/0[0] class=idle_sched_class

  R StreamTrans #13[23074] -1192ms
      scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=2/1782320
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    smaps_pte_range+0x171/0x570
    walk_pgd_range+0x5da/0xa00
    __walk_page_range+0x5b/0x1d0
    walk_page_vma+0xc9/0x110
    show_smap+0x9d/0x280
    seq_read_iter+0x319/0x430
    seq_read+0x13d/0x180
    vfs_read+0xb7/0x320
    __x64_sys_read+0x79/0xf0
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 1   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=995151 pnt_seq=841045
          curr=swapper/1[0] class=idle_sched_class
  cpus_to_kick   : 0002

  R threaded-ml[7744] -4810ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/995135
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    futex_wait+0xc3/0x160
    do_futex+0x158/0x1d0
    __se_sys_futex+0x120/0x1e0
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 2   : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1382977 pnt_seq=1143372
          curr=swapper/2[0] class=idle_sched_class
  cpus_to_kick   : 0004

  R Isolated Web Co[6846] -3932ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1382963
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    get_mem_cgroup_from_mm+0x4/0x100
    __mem_cgroup_charge+0x1a/0xb0
    do_pte_missing+0x9c5/0xe90
    handle_mm_fault+0x997/0x1500
    do_user_addr_fault+0x25e/0x750
    exc_page_fault+0x6b/0x110
    asm_exc_page_fault+0x26/0x30

  R TaskCon~ller #1[11548] -2036ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1382971
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    futex_wait+0xc3/0x160
    do_futex+0x158/0x1d0
    __se_sys_futex+0x120/0x1e0
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 3   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=697702 pnt_seq=580514
          curr=swapper/3[0] class=idle_sched_class
  cpus_to_kick   : 0008

  R StreamTrans #39[23076] -962ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/697700
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    seq_putc+0x28/0x40
    show_vma_header_prefix+0x8f/0x140
    show_map_vma+0x6b/0x1f0
    show_smap+0xa8/0x280
    seq_read_iter+0x319/0x430
    seq_read+0x13d/0x180
    vfs_read+0xb7/0x320
    __x64_sys_read+0x79/0xf0
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 4   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=1685508 pnt_seq=1385938
          curr=swapper/4[0] class=idle_sched_class
  cpus_to_kick   : 0010

  R glean.dispatche[4738] -5076ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1685484
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256924731832
      cpus=ffff

    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 6   : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1113809 pnt_seq=918714
          curr=swapper/6[0] class=idle_sched_class
  cpus_to_kick   : 0040

  R WRScene~derLP#1[4856] -4760ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1113792
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256536597618
      cpus=ffff

    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

  R niri[4017] -4388ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1113797
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=257059099749
      cpus=ffff

    __x64_sys_epoll_wait+0xc8/0x100
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 8   : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1634656 pnt_seq=1365933
          curr=swapper/8[0] class=idle_sched_class
  cpus_to_kick   : 0100

  R .kgx-wrapped[11619] -5172ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1634640
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    wq_worker_comm+0x82/0x1b0
    proc_task_name+0x9d/0x150
    proc_pid_status+0x60/0xe10
    proc_single_show+0x50/0xb0
    seq_read_iter+0x14e/0x430
    seq_read+0x13d/0x180
    vfs_read+0xb7/0x320
    __x64_sys_read+0x79/0xf0
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

  R IPC I/O Child[4961] -5108ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1634641
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=251539433448
      cpus=ffff

    __x64_sys_epoll_wait+0xc8/0x100
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 11  : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1802923 pnt_seq=1490758
          curr=kworker/u64:2[120] class=ext_sched_class

 *R kworker/u64:2[120] +0ms
      scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=257059208040
      cpus=ffff

    scx_dump_state+0x63f/0x840
    scx_ops_error_irq_workfn+0x23/0x50
    irq_work_run+0xb8/0x150
    __sysvec_irq_work+0x1c/0xd0
    sysvec_irq_work+0x6e/0x80
    asm_sysvec_irq_work+0x1a/0x20
    scx_watchdog_workfn+0x137/0x1a0
    process_scheduled_works+0x206/0x480
    worker_thread+0x349/0x5e0
    kthread+0x14a/0x170
    ret_from_fork+0x40/0x50
    ret_from_fork_asm+0x1a/0x30

  R IPC I/O Parent[4740] -3684ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1802910
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=257059212384
      cpus=ffff

    __x64_sys_epoll_wait+0xc8/0x100
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 14  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=2289871 pnt_seq=1831828
          curr=swapper/14[0] class=idle_sched_class

  R WRScene~ilder#1[4855] -5010ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/2289849
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256583729520
      cpus=ffff

    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 15  : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=1975342 pnt_seq=1644291
          curr=swapper/15[0] class=idle_sched_class
  cpus_to_kick   : 8000
  idle_to_kick   : 0800

  R btrfs-transacti[341] -4092ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1975324
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256826913698
      cpus=ffff

    btrfs_set_token_32+0x37/0x210 [btrfs]
    setup_items_for_insert+0x275/0x460 [btrfs]
    btrfs_insert_empty_items+0x5e/0x90 [btrfs]
    __btrfs_run_delayed_refs+0x7ef/0x13f0 [btrfs]
    btrfs_run_delayed_refs+0x39/0x120 [btrfs]
    btrfs_start_dirty_block_groups+0x12f/0x590 [btrfs]
    btrfs_commit_transaction+0x23c/0x1040 [btrfs]
    transaction_kthread+0x12d/0x1c0 [btrfs]
    kthread+0x14a/0x170
    ret_from_fork+0x40/0x50
    ret_from_fork_asm+0x1a/0x30

  R WebExtensions[4824] -3756ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1975330
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=256943569144
      cpus=ffff

    __x64_sys_poll+0xa9/0x140
    do_syscall_64+0x91/0x180
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

================================================================================

03:33:10 [INFO] Unregister RustLand scheduler
Error: EXIT: runnable task stall (glean.dispatche[4738] failed to run for 5.076s)

Edit 1: Got another crash

04:25:56 [INFO] cpus=16 min_vruntime=40706736907195 slice=5000us
04:25:56 [INFO]   tasks=0
04:25:56 [INFO]   nr_user_dispatches=14742368 nr_kernel_dispatches=29939067
04:25:56 [INFO]   nr_cancel_dispatches=1 nr_bounce_dispatches=0
04:25:56 [INFO]   nr_running=0 nr_waiting=2 [nr_queued=0 + nr_scheduled=2]
04:25:56 [INFO]   nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]
04:25:56 [INFO] [pid]       [nvcsw] [comm] <-- interactive tasks
04:25:56 [INFO] 67841           162 kworker/u64:5+btrfs-endio-write
04:25:56 [INFO] 44019            12 kworker/u64:18-events_unbound
04:25:56 [INFO] cpus=16 min_vruntime=40706736907195 slice=5000us
04:25:56 [INFO]   tasks=0
04:25:56 [INFO]   nr_user_dispatches=14742368 nr_kernel_dispatches=29939067
04:25:56 [INFO]   nr_cancel_dispatches=1 nr_bounce_dispatches=0
04:25:56 [INFO]   nr_running=0 nr_waiting=2 [nr_queued=0 + nr_scheduled=2]
04:25:56 [INFO]   nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]

DEBUG DUMP
================================================================================

kworker/u64:18[44019] triggered exit kind 1026:
  runnable task stall (kworker/u65:2[42574] failed to run for 6.836s)

Backtrace:
  scx_watchdog_workfn+0x11b/0x1a0
  process_scheduled_works+0x206/0x480
  worker_thread+0x349/0x5e0
  kthread+0x14a/0x170
  ret_from_fork+0x40/0x50
  ret_from_fork_asm+0x1a/0x30

CPU states
----------

CPU 4   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=6398653 pnt_seq=5610460
          curr=swapper/4[0] class=idle_sched_class

  R kworker/u65:2[42574] -6836ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/6398633
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    zram_submit_bio+0x336/0x900 [zram]
    __submit_bio+0x159/0x250
    submit_bio_noacct_nocheck+0x1f5/0x390
    submit_bio_wait+0x5b/0xd0
    __swap_writepage+0x2e2/0x750
    shrink_memcg_cb+0x24a/0x2d0
    __list_lru_walk_one+0x113/0x230
    list_lru_walk_one+0x5e/0x90
    zswap_shrinker_scan+0xbf/0x150
    do_shrink_slab+0x170/0x340
    shrink_slab+0x334/0x4b0
    shrink_one+0xd1/0x3c0
    shrink_node+0xdf6/0x1240
    do_try_to_free_pages+0x19b/0x5a0
    try_to_free_pages+0x486/0x5d0
    __alloc_pages_slowpath+0x6b1/0x1010
    __alloc_pages_noprof+0x283/0x370
    alloc_pages_mpol_noprof+0x11d/0x1f0
    iwl_pcie_rx_alloc_page+0xf0/0x240 [iwlwifi]
    iwl_pcie_rx_allocator_work+0x15c/0x370 [iwlwifi]
    process_scheduled_works+0x206/0x480
    worker_thread+0x349/0x5e0
    kthread+0x14a/0x170
    ret_from_fork+0x40/0x50
    ret_from_fork_asm+0x1a/0x30

CPU 9   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=4715423 pnt_seq=3884049
          curr=kworker/u64:18[44019] class=ext_sched_class

 *R kworker/u64:18[44019] +0ms
      scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    scx_dump_state+0x63f/0x840
    scx_ops_error_irq_workfn+0x23/0x50
    irq_work_run+0xb8/0x150
    __sysvec_irq_work+0x1c/0xd0
    sysvec_irq_work+0x6e/0x80
    asm_sysvec_irq_work+0x1a/0x20
    scx_watchdog_workfn+0x137/0x1a0
    process_scheduled_works+0x206/0x480
    worker_thread+0x349/0x5e0
    kthread+0x14a/0x170
    ret_from_fork+0x40/0x50
    ret_from_fork_asm+0x1a/0x30

CPU 11  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=4673484 pnt_seq=3760085
          curr=swapper/11[0] class=idle_sched_class
  cpus_to_kick   : 0800
  idle_to_kick   : 0040

  R kswapd0[138] -6856ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/4673444
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    try_to_wake_up+0x334/0x610
    btrfs_free_extent_maps+0x4e7/0x6e0 [btrfs]
    super_cache_scan+0x1d5/0x210
    do_shrink_slab+0x170/0x340
    shrink_slab+0x334/0x4b0
    shrink_one+0xd1/0x3c0
    shrink_node+0xdf6/0x1240
    kswapd+0xd80/0x1390
    kthread+0x14a/0x170
    ret_from_fork+0x40/0x50
    ret_from_fork_asm+0x1a/0x30

CPU 14  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=6019245 pnt_seq=4862403
          curr=swapper/14[0] class=idle_sched_class
  cpus_to_kick   : 4000

  R kworker/u64:7[69892] -2186ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/6019242
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=ffff

    get_page_from_freelist+0x13c9/0x14b0
    __alloc_pages_slowpath+0x627/0x1010
    __alloc_pages_noprof+0x283/0x370
    allocate_slab+0xa7/0x740
    ___slab_alloc+0x47b/0xa20
    kmem_cache_alloc_noprof+0x1ca/0x2d0
    zs_malloc+0x42/0x2d0
    zs_zpool_malloc+0x12/0x30
    zswap_compress+0x1a1/0x260
    zswap_store+0x43a/0x760
    swap_writepage+0x41/0xe0
    shrink_folio_list+0xacd/0x1060
    evict_folios+0x1a6a/0x2080
    try_to_shrink_lruvec+0x29d/0x470
    shrink_one+0xb6/0x3c0
    shrink_node+0xdf6/0x1240
    do_try_to_free_pages+0x19b/0x5a0
    try_to_free_pages+0x486/0x5d0
    __alloc_pages_slowpath+0x6b1/0x1010
    __alloc_pages_noprof+0x283/0x370
    alloc_pages_bulk_noprof+0x337/0x7b0
    alloc_eb_folio_array+0x10a/0x1e0 [btrfs]
    alloc_extent_buffer+0x14b/0x7f0 [btrfs]
    btrfs_alloc_tree_block+0x10a/0x520 [btrfs]
    btrfs_force_cow_block+0x15c/0x7d0 [btrfs]
    btrfs_cow_block+0xe1/0x260 [btrfs]
    btrfs_search_slot+0x36d/0xa00 [btrfs]
    btrfs_mark_extent_written+0x11a/0xf00 [btrfs]
    btrfs_finish_one_ordered+0x358/0x8e0 [btrfs]
    btrfs_work_helper+0xd0/0x420 [btrfs]
    process_scheduled_works+0x206/0x480
    worker_thread+0x349/0x5e0
    kthread+0x14a/0x170
    ret_from_fork+0x40/0x50
    ret_from_fork_asm+0x1a/0x30

================================================================================

04:25:56 [INFO] Unregister RustLand scheduler
Error: EXIT: runnable task stall (kworker/u65:2[42574] failed to run for 6.836s)
arighi commented 2 months ago

Thanks that's really useful.

I see lots of tasks in the traces that are not dispatched (meaning that they are probably sitting in the user-space scheduler's queue), I'm wondering if the user-space scheduler is blocked by something and it's unable to dispatch the tasks. Like in the second trace it looks like the system is trying to reclaim memory. When this happens if the user-space scheduler hits a page fault, we may have a deadlock. This should be prevented by setting /proc/sys/vm/compact_unevictable_allowed when the scheduler runs (that should be done automatically by scx_rustland), can you check if that's the case?

When this problem happens is your system under memory pressure conditions (almost out of free memory / free -m)?

0x006E commented 2 months ago

I switched to bpfland, but it too does freeze the system sometimes, but has not crash yet.

~ ❯ cat /proc/sys/vm/compact_unevictable_allowed
0

shows this. Most of the time, I do have many apps on and RAM will be probably under 1GB available. But there is a general system lag like random freezing when switching apps, or spawning new processes. I'll try to check if occurs during a memory stress test.

If you can give me something to test, I can do that too.

arighi commented 2 months ago

Interesting that you're experiencing system freezes also with bpfland, does it happen with other sched-ext schedulers as well? like scx_lavd, scx_rusty?

0x006E commented 2 months ago

I haven't tried others, I can try and report if these exist in those schedulers too. A memory stress does not make the scheduler crash. Only makes the system unresponsive. I'm also using zram if that's relevant too.

https://github.com/user-attachments/assets/628a80c1-4fcd-44f3-8bb0-152a697acde0

arighi commented 2 months ago

I think the memory pressure condition is slowing down the user-space scheduler (somehow). That's probably why the system becomes unresponsive. But this shouldn't happen with other schedulers that don't have a relevant user-space counterpart, such as bpfland. I'll try to reproduce this on my side. Thanks!

0x006E commented 2 months ago

I was wrong about the bpfland scheduler. It doesn't make the system unresponsive even on low memory conditions. But bpfland crashes sometimes (dmesg still shows scheduler stall). I cannot capture a trace because the crash happens randomly and is very infrequent unlike rustland.

arighi commented 2 months ago

Ok thanks for the update, makes sense, let me know if you can catch a trace with bpfland and I'll take a look.

On the rustland side I'll do some tests locally under memory pressure conditions.

Kawanaao commented 2 months ago

I have the same problems, under high loads scx_rusty on musl fails, this happened during compilation, as can be seen from llvm-objcopy, but I’m not sure that this is one of the most difficult stages, in theory, before the single-threaded lld stage there was a multi-threaded one, clogging all processor cores, which often takes a very long time I can assume that the scheduler in this case simply queues a request to receive data about the process, but due to long freezes, the task is completed before rusty can find out information about it

DEBUG DUMP
================================================================================

pahole[11409] triggered exit kind 1025:
  scx_bpf_error (task_ctx lookup failed for pid 11415)

Backtrace:
  scx_bpf_error_bstr+0x65/0x80
  bpf_prog_33340da880198fec_rusty_select_cpu+0x359/0xcd5

CPU states
----------

CPU 0   : nr_run=3 flags=0x1 cpu_rel=0 ops_qseq=38568750 pnt_seq=10733736
          curr=pahole[10859] class=ext_sched_class

 *R pahole[10859] -17ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272254783601
      cpus=ffffff

  R pahole[11396] -15ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=831272272285269
      cpus=ffffff

    do_exit+0x208/0xa20
    __x64_sys_exit+0x17/0x20
    x64_sys_call+0x216b/0x2190
    do_syscall_64+0x8e/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

  R kworker/0:0[14322] -7ms
      scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=831272252642544
      cpus=000001

    kthread+0x165/0x170
    ret_from_fork+0x3a/0x50
    ret_from_fork_asm+0x1a/0x30

CPU 2   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=28800786 pnt_seq=9488776
          curr=pahole[10858] class=ext_sched_class

 *R pahole[10858] -14ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272078888049
      cpus=ffffff

CPU 6   : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=38443503 pnt_seq=10900631
          curr=llvm-objcopy[11439] class=ext_sched_class

 *R llvm-objcopy[11439] -6ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=690011920160776
      cpus=ffffff

  R kworker/6:2[4218] -2ms
      scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x2 dsq_vtime=690011900656067
      cpus=000040

    kthread+0x165/0x170
    ret_from_fork+0x3a/0x50
    ret_from_fork_asm+0x1a/0x30

CPU 7   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=31282510 pnt_seq=10433864
          curr=pahole[10869] class=ext_sched_class

 *R pahole[10869] +0ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272289461845
      cpus=ffffff

CPU 9   : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=38009225 pnt_seq=10832640
          curr=pahole[11284] class=ext_sched_class

 *R pahole[11284] -16ms
      scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=694478940670545
      cpus=ffffff

  R pahole[11424] +0ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478960037022
      cpus=ffffff

    do_exit+0x208/0xa20
    __x64_sys_exit+0x17/0x20
    x64_sys_call+0x216b/0x2190
    do_syscall_64+0x8e/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 10  : nr_run=6 flags=0x1 cpu_rel=0 ops_qseq=31336007 pnt_seq=10425509
          curr=llvm-objcopy[11428] class=ext_sched_class

 *R llvm-objcopy[11428] -12ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=694478960028236
      cpus=ffffff

  R kworker/10:2[15834] -9ms
      scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x3 dsq_vtime=694478940227902
      cpus=000400

    kthread+0x165/0x170
    ret_from_fork+0x3a/0x50
    ret_from_fork_asm+0x1a/0x30

  R pahole[11438] -7ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478960055368
      cpus=ffffff

    do_exit+0x208/0xa20
    __x64_sys_exit+0x17/0x20
    x64_sys_call+0x216b/0x2190
    do_syscall_64+0x8e/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

  R pahole[11346] +0ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478957571866
      cpus=ffffff

    do_exit+0x208/0xa20
    __x64_sys_exit+0x17/0x20
    x64_sys_call+0x216b/0x2190
    do_syscall_64+0x8e/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

  R pahole[11403] +0ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478959580175
      cpus=ffffff

    do_exit+0x208/0xa20
    __x64_sys_exit+0x17/0x20
    x64_sys_call+0x216b/0x2190
    do_syscall_64+0x8e/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

  R pahole[11415] +0ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=694478959808035
      cpus=ffffff

    do_exit+0x208/0xa20
    __x64_sys_exit+0x17/0x20
    x64_sys_call+0x216b/0x2190
    do_syscall_64+0x8e/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 11  : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=29224391 pnt_seq=10063185
          curr=pahole[11300] class=ext_sched_class

 *R pahole[11300] -15ms
      scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=694478942555277
      cpus=ffffff

  R kworker/11:0[1561] -13ms
      scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x3 dsq_vtime=694478939681337
      cpus=000800

    kthread+0x165/0x170
    ret_from_fork+0x3a/0x50
    ret_from_fork_asm+0x1a/0x30

CPU 14  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=28689863 pnt_seq=9430221
          curr=pahole[10871] class=ext_sched_class

 *R pahole[10871] -16ms
      scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272254067677
      cpus=ffffff

CPU 15  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=39940361 pnt_seq=10579499
          curr=pahole[10860] class=ext_sched_class

 *R pahole[10860] -19ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=674247303750124
      cpus=ffffff

CPU 18  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=38536674 pnt_seq=10987690
          curr=ld.lld[11442] class=ext_sched_class

 *R ld.lld[11442] -1ms
      scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=690011920586053
      cpus=ffffff

CPU 19  : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=31239404 pnt_seq=10432656
          curr=pahole[11315] class=ext_sched_class

 *R pahole[11315] -11ms
      scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=690011915451870
      cpus=ffffff

  R kworker/19:0[12400] -9ms
      scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x2 dsq_vtime=690011900326430
      cpus=080000

    kthread+0x165/0x170
    ret_from_fork+0x3a/0x50
    ret_from_fork_asm+0x1a/0x30

CPU 21  : nr_run=2 flags=0x1 cpu_rel=0 ops_qseq=37913262 pnt_seq=10715091
          curr=pahole[10863] class=ext_sched_class

 *R pahole[10863] -18ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=831272255238321
      cpus=ffffff

  R kworker/21:0[3061] -17ms
      scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x3 dsq_vtime=694478938370490
      cpus=200000

    kthread+0x165/0x170
    ret_from_fork+0x3a/0x50
    ret_from_fork_asm+0x1a/0x30

CPU 22  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=31210320 pnt_seq=10360926
          curr=pahole[11409] class=ext_sched_class

 *R pahole[11409] +0ms
      scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=694478960296092
      cpus=ffffff

    scx_dump_state+0x617/0x800
    scx_ops_error_irq_workfn+0x23/0x50
    irq_work_run+0xa3/0x120
    __sysvec_irq_work+0x1c/0xc0
    sysvec_irq_work+0x69/0x80
    asm_sysvec_irq_work+0x1a/0x20
    try_to_wake_up+0x394/0x730
    wake_up_q+0x44/0x80
    rwsem_wake+0x6b/0xa0
    up_write+0x3c/0x50
    vm_mmap_pgoff+0x100/0x1b0
    do_syscall_64+0x8e/0x170
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

CPU 23  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=29084174 pnt_seq=9908140
          curr=pahole[10862] class=ext_sched_class

 *R pahole[10862] -7ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=674247256733037
      cpus=ffffff

================================================================================

Error: EXIT: scx_bpf_error (task_ctx lookup failed for pid 11415)
htejun commented 2 months ago

scx_rusty failing to lookup task_ctx is likely due to using pid_t to key these searches. The pid can change if execing from multithreaded processes which then can lead to these crashes. Planning to fix it soon.