Closed krizhanovsky closed 5 years ago
I'm not sure if it's related or an other issue, bu also got RCU warning:
[11329934117.353112] INFO: rcu_sched self-detected stall on CPU
[11329934117.353112] 1-...: (5246 ticks this GP) idle=95e/140000000000001/0 softirq=71204/71204 fqs=0
[11329934117.353112] (t=5251 jiffies g=24109 c=24108 q=6)
[11329934117.353112] rcu_sched kthread starved for 5251 jiffies! g24109 c24108 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
[11329934117.361118] INFO: rcu_sched detected stalls on CPUs/tasks:
[11329934117.361121] 1-...: (5246 ticks this GP) idle=95e/140000000000001/0 softirq=71204/71204 fqs=0
[11329934117.361121] (detected by 0, t=5252 jiffies, g=24109, c=24108, q=6)
[11329934117.361121] Sending NMI from CPU 0 to CPUs 1:
[11329934117.361121] NMI backtrace for cpu 1
[11329934117.361121] CPU: 1 PID: 12328 Comm: wrk Tainted: G W O 4.14.32-kdump+ #142
[11329934117.361121] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[11329934117.361121] task: ffff9f797bb821c0 task.stack: ffff9f797bca8000
[11329934117.361121] RIP: 0010:cfb_imageblit+0x4c6/0x520
[11329934117.361121] RSP: 0018:ffff9f797fd039f0 EFLAGS: 00000046
[11329934117.361121] RAX: 0000000000000000 RBX: ffff9f797a862d97 RCX: 0000000000000005
[11329934117.361121] RDX: ffffc3bac0cf1c88 RSI: 0000000000000000 RDI: 0000000000000001
[11329934117.361121] RBP: ffffc3bac0cf1c8c R08: ffffffffbde885c0 R09: 0000000000aaaaaa
[11329934117.361121] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001000
[11329934117.361121] R13: ffffc3bac0cf1d60 R14: 000000000000000e R15: ffffc3bac0cf1c80
[11329934117.361121] FS: 00007f7e07930700(0000) GS:ffff9f797fd00000(0000) knlGS:0000000000000000
[11329934117.361121] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11329934117.361121] CR2: 00007f7e0006b2c8 CR3: 000000002ef06002 CR4: 00000000003606e0
[11329934117.361121] Call Trace:
[11329934117.361121] <IRQ>
[11329934117.361121] drm_fb_helper_cfb_imageblit+0xd/0x30 [drm_kms_helper]
[11329934117.361121] bit_putcs+0x2bc/0x4b0
[11329934117.361121] ? drm_fb_helper_cfb_copyarea+0x11/0x30 [drm_kms_helper]
[11329934117.361121] ? soft_cursor+0x19f/0x220
[11329934117.361121] ? bit_clear+0x110/0x110
[11329934117.361121] fbcon_putcs+0xf5/0x130
[11329934117.361121] fbcon_redraw.isra.20+0xdb/0x1c0
[11329934117.361121] fbcon_scroll+0x342/0xc60
[11329934117.361121] con_scroll+0xce/0xe0
[11329934117.361121] lf+0x99/0xa0
[11329934117.361121] vt_console_print+0x2a2/0x3f0
[11329934117.361121] console_unlock+0x3c3/0x4b0
[11329934117.361121] vprintk_emit+0x259/0x2c0
Just saw the RCU warning again after relatively long time of inactivity after TLS functional tests execution (see timestamps in the log):
[23643.858312] tsc: Marking TSC unstable due to clocksource watchdog
[23643.868078] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[23643.872516] sched_clock: Marking unstable (23643878824366, -10745512)<-(23643918108916, -50032260)
[23643.913878] clocksource: Switched to clocksource hpet
[41201.262146] INFO: rcu_sched self-detected stall on CPU
[41208.689243] INFO: rcu_sched detected stalls on CPUs/tasks:
[41208.689243] 1-...: (2 ticks this GP) idle=90e/140000000000001/0 softirq=282036/282036 fqs=0
[41208.689243] (detected by 0, t=9357 jiffies, g=48207, c=48206, q=8)
[41208.689243] Sending NMI from CPU 0 to CPUs 1:
[41208.689243] NMI backtrace for cpu 1
[41208.689243] CPU: 1 PID: 29397 Comm: kworker/1:0 Tainted: G W O 4.14.32-kdump+ #142
[41208.689243] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[41208.689243] Workqueue: events e1000_watchdog [e1000]
[41208.689243] task: ffff89358f770140 task.stack: ffff8935ca1c8000
[41208.689243] RIP: 0010:io_serial_out+0x11/0x20
[41208.689243] RSP: 0018:ffff8935ffd03d18 EFLAGS: 00000006
[41208.689243] RAX: 0000000000000000 RBX: ffffffffa8af7000 RCX: 0000000000000000
[41208.689243] RDX: 00000000000003f9 RSI: 0000000000000001 RDI: ffffffffa8af7000
[41208.689243] RBP: 0000000000000001 R08: 0000000000040002 R09: 0000000000000005
[41208.689243] R10: 0000000000000008 R11: ffffffffa8a9a8ad R12: 0000000000000005
[41208.689243] R13: ffffffffa8a9a8a0 R14: 000000000000003a R15: 0000000000000046
[41208.689243] FS: 0000000000000000(0000) GS:ffff8935ffd00000(0000) knlGS:0000000000000000
[41208.689243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41208.689243] CR2: 00007ffccbc06038 CR3: 000000000f00a001 CR4: 00000000003606e0
[41208.689243] Call Trace:
[41208.689243] <IRQ>
[41208.689243] serial8250_console_write+0x13e/0x280
[41208.689243] ? print_prefix+0xba/0x1b0
[41208.689243] ? msg_print_text+0x9c/0x100
[41208.689243] console_unlock+0x3c3/0x4b0
[41208.689243] vprintk_emit+0x259/0x2c0
[41208.689243] printk+0x4d/0x69
[41208.689243] rcu_check_callbacks+0x4c5/0x900
[41208.689243] ? update_wall_time+0x45b/0x720
[41208.689243] ? tick_sched_do_timer+0x40/0x40
[41208.689243] update_process_times+0x23/0x50
[41208.689243] tick_sched_handle+0x30/0x50
[41208.689243] tick_sched_timer+0x2f/0x70
[41208.689243] __hrtimer_run_queues+0xd3/0x220
[41208.689243] hrtimer_interrupt+0xa1/0x1e0
[41208.689243] smp_apic_timer_interrupt+0x61/0x120
[41208.689243] apic_timer_interrupt+0x7a/0x80
[41208.689243] </IRQ>
[41208.689243] RIP: 0010:_raw_spin_unlock_irqrestore+0x5/0x10
[41208.689243] RSP: 0018:ffff8935ca1cbe00 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
The TSC warning is frequently seen in Tempesta workloads. The problem is in Switched to clocksource hpet
- HPET has much higher overhead than TSC (e.g. see https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1821441). Also, since https://lore.kernel.org/patchwork/patch/782295/ :
Since the clocksource watchdog will only detect broken TSC after the fact, all TSC based clocks will likely have observed non-continuous values before/when switching away from TSC.
, it seems the RCU and dev_watchdog issues might be due to the TSC issue.
There are several reasons for possible RCU stale: connected serial console plus warnings (we have a lot of them in perfomance tests), RCU kernel threads starvation (rcu_sched kthread starved for 5251 jiffies!
in second log) due to high volume networking (same as for TSC). All the 2 RCU warnings are about serial console on the VM. However, since the RCU warnings appear only 1000-18000sec after the workload, the problems isn't in softirqs overloaded by Tempesta, but actually by the system timer.
Related discussion https://bugzilla.redhat.com/show_bug.cgi?id=1352992 and patch not accepted in mainline https://lore.kernel.org/patchwork/patch/595803/. The timer problem is easy to reproduce on native Debian kernel in VM: just suspend and wakeup the host system.
Kernel parameter tsc=unstable
fixes the problem, moving to HPET clocksource:
root@debian:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm
root@debian:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
However, as was mentioned above, HPET introduced significant overhead:
6.80% [kernel] [k] read_hpet
2.55% [kernel] [k] syscall_return_via_sysret
2.50% libc-2.24.so [.] _int_malloc
2.38% wrk [.] http_parser_execute
With -no-hpet
command line argument for Qemu, tsc falls to acpi_pm
which doesn't impose so high overhead.
From time to time, usually after TLS stress test I see following warning in dmesg: