Open gjcolombo opened 1 year ago
Note I also see this at
james@dinnerbone:~$ cat /etc/versions/build
heads/stlouis-0-gce8e8e601e
which includes the illumos#15143 bits:
[ 6.552238] EXT4-fs (vda3): resizing filesystem from 472571 to 18298363 blocks
[ 7.864090] EXT4-fs (vda3): resized filesystem to 18298363
[ 9.598778] device-mapper: uevent: version 1.0.3
[ 9.598892] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com
[ 58.242224] audit: type=1400 audit(1682651165.345:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/chronyd" pid=2681 comm="apparmor_parser"
<<<< MIGRATION >>>>
[ 183.116103] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 183.116131] clocksource: 'acpi_pm' wd_now: da93f0 wd_last: bea500 mask: ffffff
[ 183.116151] clocksource: 'tsc' cs_now: 9f8e67f8bf cs_last: 9f23cc4b79 mask: ffffffffffffffff
[ 183.116173] tsc: Marking TSC unstable due to clocksource watchdog
[ 183.116486] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[ 183.116488] sched_clock: Marking unstable (183118347082, -1856502)<-(183136894483, -20409854)
[ 183.116700] clocksource: Checking clocksource tsc synchronization from CPU 1.
[ 183.116756] clocksource: Switched to clocksource acpi_pm
[ 385.392113] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 385.393282] rcu: 0-...!: (0 ticks this GP) idle=fdc/0/0x0 softirq=11204/11204 fqs=0 (false positive?)
[ 385.394237] (detected by 1, t=5252 jiffies, g=12417, q=82)
[ 385.394239] Sending NMI from CPU 1 to CPUs 0:
[ 385.394373] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 385.395243] rcu: rcu_sched kthread starved for 5252 jiffies! g12417 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 385.396074] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 385.396074] rcu: RCU grace-period kthread stack dump:
[ 385.396074] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 385.396074] Call Trace:
[ 385.396074] __schedule+0x282/0x880
[ 385.396074] ? __switch_to_asm+0x3a/0x60
[ 385.396074] schedule+0x46/0xb0
[ 385.396074] schedule_timeout+0x8b/0x150
[ 385.396074] ? __next_timer_interrupt+0x110/0x110
[ 385.396074] rcu_gp_kthread+0x51b/0xbb0
[ 385.396074] ? rcu_cpu_kthread+0x190/0x190
[ 385.396074] kthread+0x11b/0x140
[ 385.396074] ? __kthread_bind_mask+0x60/0x60
[ 385.396074] ret_from_fork+0x22/0x30
[ 448.412113] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 448.413389] rcu: 0-...!: (0 ticks this GP) idle=230/0/0x0 softirq=11204/11204 fqs=1 (false positive?)
[ 448.415278] (detected by 1, t=21007 jiffies, g=12417, q=124)
[ 448.415280] Sending NMI from CPU 1 to CPUs 0:
[ 448.415360] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 448.416074] rcu: rcu_sched kthread starved for 15754 jiffies! g12417 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 448.416074] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 448.416074] rcu: RCU grace-period kthread stack dump:
[ 448.416074] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 448.416074] Call Trace:
[ 448.416074] __schedule+0x282/0x880
[ 448.416074] schedule+0x46/0xb0
[ 448.416074] schedule_timeout+0x8b/0x150
[ 448.416074] ? __next_timer_interrupt+0x110/0x110
[ 448.416074] rcu_gp_kthread+0x51b/0xbb0
[ 448.416074] ? rcu_cpu_kthread+0x190/0x190
[ 448.416074] kthread+0x11b/0x140
[ 448.416074] ? __kthread_bind_mask+0x60/0x60
[ 448.416074] ret_from_fork+0x22/0x30
[ 469.424115] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 469.425395] rcu: 0-...!: (0 ticks this GP) idle=2f4/0/0x0 softirq=11204/11204 fqs=0 (false positive?)
[ 469.427291] (detected by 1, t=5252 jiffies, g=12421, q=160)
[ 469.427295] Sending NMI from CPU 1 to CPUs 0:
[ 469.427394] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 469.428075] rcu: rcu_sched kthread starved for 5252 jiffies! g12421 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 469.428075] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 469.428075] rcu: RCU grace-period kthread stack dump:
[ 469.428075] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 469.428075] Call Trace:
[ 469.428075] __schedule+0x282/0x880
[ 469.428075] schedule+0x46/0xb0
[ 469.428075] schedule_timeout+0x8b/0x150
[ 469.428075] ? __next_timer_interrupt+0x110/0x110
[ 469.428075] rcu_gp_kthread+0x51b/0xbb0
[ 469.428075] ? rcu_cpu_kthread+0x190/0x190
[ 469.428075] kthread+0x11b/0x140
[ 469.428075] ? __kthread_bind_mask+0x60/0x60
[ 469.428075] ret_from_fork+0x22/0x30
[ 532.444115] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 532.445401] rcu: 0-...!: (0 ticks this GP) idle=530/0/0x0 softirq=11204/11204 fqs=1 (false positive?)
[ 532.446358] (detected by 1, t=21007 jiffies, g=12421, q=435)
[ 532.446360] Sending NMI from CPU 1 to CPUs 0:
[ 532.446440] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 532.447365] rcu: rcu_sched kthread starved for 15753 jiffies! g12421 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 532.448075] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 532.448075] rcu: RCU grace-period kthread stack dump:
[ 532.448075] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 532.448075] Call Trace:
[ 532.448075] __schedule+0x282/0x880
[ 532.448075] schedule+0x46/0xb0
[ 532.448075] schedule_timeout+0x8b/0x150
[ 532.448075] ? __next_timer_interrupt+0x110/0x110
[ 532.448075] rcu_gp_kthread+0x51b/0xbb0
[ 532.448075] ? rcu_cpu_kthread+0x190/0x190
[ 532.448075] kthread+0x11b/0x140
[ 532.448075] ? __kthread_bind_mask+0x60/0x60
[ 532.448075] ret_from_fork+0x22/0x30
[ 553.456118] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 553.457413] rcu: 0-...!: (0 ticks this GP) idle=5fc/0/0x0 softirq=11204/11204 fqs=0 (false positive?)
[ 553.460077] (detected by 1, t=5252 jiffies, g=12425, q=473)
[ 553.460077] Sending NMI from CPU 1 to CPUs 0:
[ 553.460394] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 553.460663] rcu: rcu_sched kthread starved for 5252 jiffies! g12425 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 553.461483] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 553.461483] rcu: RCU grace-period kthread stack dump:
[ 553.461483] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 553.461483] Call Trace:
[ 553.461483] __schedule+0x282/0x880
[ 553.461483] schedule+0x46/0xb0
[ 553.461483] schedule_timeout+0x8b/0x150
[ 553.461483] ? __next_timer_interrupt+0x110/0x110
[ 553.461483] rcu_gp_kthread+0x51b/0xbb0
[ 553.461483] ? rcu_cpu_kthread+0x190/0x190
[ 553.461483] kthread+0x11b/0x140
[ 553.461483] ? __kthread_bind_mask+0x60/0x60
[ 553.461483] ret_from_fork+0x22/0x30
[ 616.476113] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 616.478385] rcu: 0-...!: (0 ticks this GP) idle=820/0/0x0 softirq=11204/11204 fqs=1 (false positive?)
[ 616.480073] (detected by 1, t=21007 jiffies, g=12425, q=540)
[ 616.480073] Sending NMI from CPU 1 to CPUs 0:
[ 616.480404] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 616.480604] rcu: rcu_sched kthread starved for 15753 jiffies! g12425 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 616.480604] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 616.480604] rcu: RCU grace-period kthread stack dump:
[ 616.480604] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 616.480604] Call Trace:
[ 616.480604] __schedule+0x282/0x880
[ 616.480604] schedule+0x46/0xb0
[ 616.480604] schedule_timeout+0x8b/0x150
[ 616.480604] ? __next_timer_interrupt+0x110/0x110
[ 616.480604] rcu_gp_kthread+0x51b/0xbb0
[ 616.480604] ? rcu_cpu_kthread+0x190/0x190
[ 616.480604] kthread+0x11b/0x140
[ 616.480604] ? __kthread_bind_mask+0x60/0x60
[ 616.480604] ret_from_fork+0x22/0x30
[ 637.492114] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 637.494354] rcu: 0-...!: (0 ticks this GP) idle=8e4/0/0x0 softirq=11204/11204 fqs=0 (false positive?)
[ 637.496074] (detected by 1, t=5252 jiffies, g=12429, q=528)
[ 637.496074] Sending NMI from CPU 1 to CPUs 0:
[ 637.498269] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 637.498646] rcu: rcu_sched kthread starved for 5252 jiffies! g12429 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 637.499229] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 637.499229] rcu: RCU grace-period kthread stack dump:
[ 637.499229] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 637.499229] Call Trace:
[ 637.499229] __schedule+0x282/0x880
[ 637.499229] schedule+0x46/0xb0
[ 637.499229] schedule_timeout+0x8b/0x150
[ 637.499229] ? __next_timer_interrupt+0x110/0x110
[ 637.499229] rcu_gp_kthread+0x51b/0xbb0
[ 637.499229] ? rcu_cpu_kthread+0x190/0x190
[ 637.499229] kthread+0x11b/0x140
[ 637.499229] ? __kthread_bind_mask+0x60/0x60
[ 637.499229] ret_from_fork+0x22/0x30
[ 700.512114] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 700.514386] rcu: 0-...!: (0 ticks this GP) idle=b00/0/0x0 softirq=11204/11204 fqs=1 (false positive?)
[ 700.516074] (detected by 1, t=21007 jiffies, g=12429, q=607)
[ 700.516074] Sending NMI from CPU 1 to CPUs 0:
[ 700.517048] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 700.517598] rcu: rcu_sched kthread starved for 15752 jiffies! g12429 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 700.517598] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 700.517598] rcu: RCU grace-period kthread stack dump:
[ 700.517598] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 700.517598] Call Trace:
[ 700.517598] __schedule+0x282/0x880
[ 700.517598] schedule+0x46/0xb0
[ 700.517598] schedule_timeout+0x8b/0x150
[ 700.517598] ? __next_timer_interrupt+0x110/0x110
[ 700.517598] rcu_gp_kthread+0x51b/0xbb0
[ 700.517598] ? rcu_cpu_kthread+0x190/0x190
[ 700.517598] kthread+0x11b/0x140
[ 700.517598] ? __kthread_bind_mask+0x60/0x60
[ 700.517598] ret_from_fork+0x22/0x30
[ 721.528113] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 721.530352] rcu: 0-...!: (0 ticks this GP) idle=bd4/0/0x0 softirq=11204/11204 fqs=0 (false positive?)
[ 721.532075] (detected by 1, t=5252 jiffies, g=12433, q=459)
[ 721.532075] Sending NMI from CPU 1 to CPUs 0:
[ 721.533057] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x20
[ 721.533300] rcu: rcu_sched kthread starved for 5252 jiffies! g12433 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 721.533300] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 721.533300] rcu: RCU grace-period kthread stack dump:
[ 721.533300] task:rcu_sched state:I stack: 0 pid: 13 ppid: 2 flags:0x00004000
[ 721.533300] Call Trace:
[ 721.533300] __schedule+0x282/0x880
[ 721.533300] schedule+0x46/0xb0
[ 721.533300] schedule_timeout+0x8b/0x150
[ 721.533300] ? __next_timer_interrupt+0x110/0x110
[ 721.533300] rcu_gp_kthread+0x51b/0xbb0
[ 721.533300] ? rcu_cpu_kthread+0x190/0x190
[ 721.533300] kthread+0x11b/0x140
[ 721.533300] ? __kthread_bind_mask+0x60/0x60
[ 721.533300] ret_from_fork+0x22/0x30
Propolis commit: c455784
Host OS:
Guest OS: Debian 11 nocloud,
Linux debian 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
Repro steps:
stress-ng
in the gueststress-ng --vm 1 --vm-bytes 2G --verify -v &
fg
stress-ng --timer 32 --timer-freq 1000000 &
fg
, kill thestress-ng
jobExpected: guest is generally happy Observed: guest gets dyspepsia after running the timer stress test:
Other observations:
stress-ng --vm
tanked migration performance by dirtying a bunch of pages (another case to investigate in #324).@jmpesp saw a similar issue in local testing earlier this week, but that was without the bits needed to enable the interrupt state transfer implemented in #367. Unless I've missed something, that should have been enabled here (both the Propolis bits and the necessary bhyve bits were present).
This VM no longer seems to be producing any RCU complaints, but I'll hold it in its current state for now.