`rcu_sched detected stalls` after local live migration

Note I also see this at

james@dinnerbone:~$ cat /etc/versions/build
heads/stlouis-0-gce8e8e601e

which includes the illumos#15143 bits:

[    6.552238] EXT4-fs (vda3): resizing filesystem from 472571 to 18298363 blocks
[    7.864090] EXT4-fs (vda3): resized filesystem to 18298363
[    9.598778] device-mapper: uevent: version 1.0.3
[    9.598892] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com
[   58.242224] audit: type=1400 audit(1682651165.345:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/chronyd" pid=2681 comm="apparmor_parser"

 <<<< MIGRATION >>>>

[  183.116103] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
[  183.116131] clocksource:                       'acpi_pm' wd_now: da93f0 wd_last: bea500 mask: ffffff
[  183.116151] clocksource:                       'tsc' cs_now: 9f8e67f8bf cs_last: 9f23cc4b79 mask: ffffffffffffffff
[  183.116173] tsc: Marking TSC unstable due to clocksource watchdog
[  183.116486] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[  183.116488] sched_clock: Marking unstable (183118347082, -1856502)<-(183136894483, -20409854)
[  183.116700] clocksource: Checking clocksource tsc synchronization from CPU 1.
[  183.116756] clocksource: Switched to clocksource acpi_pm
[  385.392113] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  385.393282] rcu:     0-...!: (0 ticks this GP) idle=fdc/0/0x0 softirq=11204/11204 fqs=0  (false positive?)
[  385.394237]  (detected by 1, t=5252 jiffies, g=12417, q=82)
[  385.394239] Sending NMI from CPU 1 to CPUs 0:
[  385.394373] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  385.395243] rcu: rcu_sched kthread starved for 5252 jiffies! g12417 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  385.396074] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  385.396074] rcu: RCU grace-period kthread stack dump:
[  385.396074] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  385.396074] Call Trace:
[  385.396074]  __schedule+0x282/0x880
[  385.396074]  ? __switch_to_asm+0x3a/0x60
[  385.396074]  schedule+0x46/0xb0
[  385.396074]  schedule_timeout+0x8b/0x150
[  385.396074]  ? __next_timer_interrupt+0x110/0x110
[  385.396074]  rcu_gp_kthread+0x51b/0xbb0
[  385.396074]  ? rcu_cpu_kthread+0x190/0x190
[  385.396074]  kthread+0x11b/0x140
[  385.396074]  ? __kthread_bind_mask+0x60/0x60
[  385.396074]  ret_from_fork+0x22/0x30
[  448.412113] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  448.413389] rcu:     0-...!: (0 ticks this GP) idle=230/0/0x0 softirq=11204/11204 fqs=1  (false positive?)
[  448.415278]  (detected by 1, t=21007 jiffies, g=12417, q=124)
[  448.415280] Sending NMI from CPU 1 to CPUs 0:
[  448.415360] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  448.416074] rcu: rcu_sched kthread starved for 15754 jiffies! g12417 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  448.416074] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  448.416074] rcu: RCU grace-period kthread stack dump:
[  448.416074] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  448.416074] Call Trace:
[  448.416074]  __schedule+0x282/0x880
[  448.416074]  schedule+0x46/0xb0
[  448.416074]  schedule_timeout+0x8b/0x150
[  448.416074]  ? __next_timer_interrupt+0x110/0x110
[  448.416074]  rcu_gp_kthread+0x51b/0xbb0
[  448.416074]  ? rcu_cpu_kthread+0x190/0x190
[  448.416074]  kthread+0x11b/0x140
[  448.416074]  ? __kthread_bind_mask+0x60/0x60
[  448.416074]  ret_from_fork+0x22/0x30
[  469.424115] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  469.425395] rcu:     0-...!: (0 ticks this GP) idle=2f4/0/0x0 softirq=11204/11204 fqs=0  (false positive?)
[  469.427291]  (detected by 1, t=5252 jiffies, g=12421, q=160)
[  469.427295] Sending NMI from CPU 1 to CPUs 0:
[  469.427394] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  469.428075] rcu: rcu_sched kthread starved for 5252 jiffies! g12421 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  469.428075] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  469.428075] rcu: RCU grace-period kthread stack dump:
[  469.428075] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  469.428075] Call Trace:
[  469.428075]  __schedule+0x282/0x880
[  469.428075]  schedule+0x46/0xb0
[  469.428075]  schedule_timeout+0x8b/0x150
[  469.428075]  ? __next_timer_interrupt+0x110/0x110
[  469.428075]  rcu_gp_kthread+0x51b/0xbb0
[  469.428075]  ? rcu_cpu_kthread+0x190/0x190
[  469.428075]  kthread+0x11b/0x140
[  469.428075]  ? __kthread_bind_mask+0x60/0x60
[  469.428075]  ret_from_fork+0x22/0x30
[  532.444115] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  532.445401] rcu:     0-...!: (0 ticks this GP) idle=530/0/0x0 softirq=11204/11204 fqs=1  (false positive?)
[  532.446358]  (detected by 1, t=21007 jiffies, g=12421, q=435)
[  532.446360] Sending NMI from CPU 1 to CPUs 0:
[  532.446440] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  532.447365] rcu: rcu_sched kthread starved for 15753 jiffies! g12421 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  532.448075] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  532.448075] rcu: RCU grace-period kthread stack dump:
[  532.448075] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  532.448075] Call Trace:
[  532.448075]  __schedule+0x282/0x880
[  532.448075]  schedule+0x46/0xb0
[  532.448075]  schedule_timeout+0x8b/0x150
[  532.448075]  ? __next_timer_interrupt+0x110/0x110
[  532.448075]  rcu_gp_kthread+0x51b/0xbb0
[  532.448075]  ? rcu_cpu_kthread+0x190/0x190
[  532.448075]  kthread+0x11b/0x140
[  532.448075]  ? __kthread_bind_mask+0x60/0x60
[  532.448075]  ret_from_fork+0x22/0x30
[  553.456118] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  553.457413] rcu:     0-...!: (0 ticks this GP) idle=5fc/0/0x0 softirq=11204/11204 fqs=0  (false positive?)
[  553.460077]  (detected by 1, t=5252 jiffies, g=12425, q=473)
[  553.460077] Sending NMI from CPU 1 to CPUs 0:
[  553.460394] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  553.460663] rcu: rcu_sched kthread starved for 5252 jiffies! g12425 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  553.461483] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  553.461483] rcu: RCU grace-period kthread stack dump:
[  553.461483] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  553.461483] Call Trace:
[  553.461483]  __schedule+0x282/0x880
[  553.461483]  schedule+0x46/0xb0
[  553.461483]  schedule_timeout+0x8b/0x150
[  553.461483]  ? __next_timer_interrupt+0x110/0x110
[  553.461483]  rcu_gp_kthread+0x51b/0xbb0
[  553.461483]  ? rcu_cpu_kthread+0x190/0x190
[  553.461483]  kthread+0x11b/0x140
[  553.461483]  ? __kthread_bind_mask+0x60/0x60
[  553.461483]  ret_from_fork+0x22/0x30
[  616.476113] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  616.478385] rcu:     0-...!: (0 ticks this GP) idle=820/0/0x0 softirq=11204/11204 fqs=1  (false positive?)
[  616.480073]  (detected by 1, t=21007 jiffies, g=12425, q=540)
[  616.480073] Sending NMI from CPU 1 to CPUs 0:
[  616.480404] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  616.480604] rcu: rcu_sched kthread starved for 15753 jiffies! g12425 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  616.480604] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  616.480604] rcu: RCU grace-period kthread stack dump:
[  616.480604] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  616.480604] Call Trace:
[  616.480604]  __schedule+0x282/0x880
[  616.480604]  schedule+0x46/0xb0
[  616.480604]  schedule_timeout+0x8b/0x150
[  616.480604]  ? __next_timer_interrupt+0x110/0x110
[  616.480604]  rcu_gp_kthread+0x51b/0xbb0
[  616.480604]  ? rcu_cpu_kthread+0x190/0x190
[  616.480604]  kthread+0x11b/0x140
[  616.480604]  ? __kthread_bind_mask+0x60/0x60
[  616.480604]  ret_from_fork+0x22/0x30
[  637.492114] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  637.494354] rcu:     0-...!: (0 ticks this GP) idle=8e4/0/0x0 softirq=11204/11204 fqs=0  (false positive?)
[  637.496074]  (detected by 1, t=5252 jiffies, g=12429, q=528)
[  637.496074] Sending NMI from CPU 1 to CPUs 0:
[  637.498269] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  637.498646] rcu: rcu_sched kthread starved for 5252 jiffies! g12429 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  637.499229] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  637.499229] rcu: RCU grace-period kthread stack dump:
[  637.499229] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  637.499229] Call Trace:
[  637.499229]  __schedule+0x282/0x880
[  637.499229]  schedule+0x46/0xb0
[  637.499229]  schedule_timeout+0x8b/0x150
[  637.499229]  ? __next_timer_interrupt+0x110/0x110
[  637.499229]  rcu_gp_kthread+0x51b/0xbb0
[  637.499229]  ? rcu_cpu_kthread+0x190/0x190
[  637.499229]  kthread+0x11b/0x140
[  637.499229]  ? __kthread_bind_mask+0x60/0x60
[  637.499229]  ret_from_fork+0x22/0x30
[  700.512114] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  700.514386] rcu:     0-...!: (0 ticks this GP) idle=b00/0/0x0 softirq=11204/11204 fqs=1  (false positive?)
[  700.516074]  (detected by 1, t=21007 jiffies, g=12429, q=607)
[  700.516074] Sending NMI from CPU 1 to CPUs 0:
[  700.517048] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  700.517598] rcu: rcu_sched kthread starved for 15752 jiffies! g12429 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  700.517598] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  700.517598] rcu: RCU grace-period kthread stack dump:
[  700.517598] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  700.517598] Call Trace:
[  700.517598]  __schedule+0x282/0x880
[  700.517598]  schedule+0x46/0xb0
[  700.517598]  schedule_timeout+0x8b/0x150
[  700.517598]  ? __next_timer_interrupt+0x110/0x110
[  700.517598]  rcu_gp_kthread+0x51b/0xbb0
[  700.517598]  ? rcu_cpu_kthread+0x190/0x190
[  700.517598]  kthread+0x11b/0x140
[  700.517598]  ? __kthread_bind_mask+0x60/0x60
[  700.517598]  ret_from_fork+0x22/0x30
[  721.528113] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  721.530352] rcu:     0-...!: (0 ticks this GP) idle=bd4/0/0x0 softirq=11204/11204 fqs=0  (false positive?)
[  721.532075]  (detected by 1, t=5252 jiffies, g=12433, q=459)
[  721.532075] Sending NMI from CPU 1 to CPUs 0:
[  721.533057] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[  721.533300] rcu: rcu_sched kthread starved for 5252 jiffies! g12433 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  721.533300] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  721.533300] rcu: RCU grace-period kthread stack dump:
[  721.533300] task:rcu_sched       state:I stack:    0 pid:   13 ppid:     2 flags:0x00004000
[  721.533300] Call Trace:
[  721.533300]  __schedule+0x282/0x880
[  721.533300]  schedule+0x46/0xb0
[  721.533300]  schedule_timeout+0x8b/0x150
[  721.533300]  ? __next_timer_interrupt+0x110/0x110
[  721.533300]  rcu_gp_kthread+0x51b/0xbb0
[  721.533300]  ? rcu_cpu_kthread+0x190/0x190
[  721.533300]  kthread+0x11b/0x140
[  721.533300]  ? __kthread_bind_mask+0x60/0x60
[  721.533300]  ret_from_fork+0x22/0x30

oxidecomputer / propolis

`rcu_sched detected stalls` after local live migration #376