TLS: transmit queue time out

krizhanovsky commented 5 years ago

From time to time, usually after TLS stress test I see following warning in dmesg:

[11329928966.660652] [tempesta fw] modules are started
[11329928983.494541] [tempesta fw] Warning: request dropped: processing error, status 502: 127.0.0.1:443
[11329928983.834895] [tempesta fw] Warning: request dropped: processing error, status 502: 127.0.0.1:443
[11329929010.739896] [tempesta fw] Warning: request dropped: processing error, status 502: 127.0.0.1:443
[11329929011.079357] [tempesta fw] Warning: request dropped: processing error, status 502: 127.0.0.1:443
[11329929241.839923] [tempesta fw] Warning: request dropped: processing error, status 502: 127.0.0.1:443
[11329930304.887712] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
[11329930304.892405] clocksource:                       'hpet' wd_now: f1dc51f3 wd_last: 317f6df4 mask: ffffffff
[11329930304.896107] clocksource:                       'tsc' cs_now: 1577cbedd2dc cs_last: 12f63dc662ac mask: ffffffffffffffff
[11329930304.896109] tsc: Marking TSC unstable due to clocksource watchdog
[11329930304.896140] NETDEV WATCHDOG: ens3 (e1000): transmit queue 0 timed out
[11329930304.896179] ------------[ cut here ]------------
[11329930304.896186] WARNING: CPU: 1 PID: 14 at net/sched/sch_generic.c:320 dev_watchdog+0x1f5/0x200
[11329930304.896186] Modules linked in: tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tempesta_lib(O) binfmt_misc bochs_drm ttm drm_kms_helper drm crct10dif_pclmul ppdev sg button serio_raw fb_sys_fops syscopyarea sysfillrect sysimgblt parport_pc parport pcspkr ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod ata_generic ata_piix libata e1000 psmouse scsi_mod i2c_piix4
[11329930304.896215] CPU: 1 PID: 14 Comm: ksoftirqd/1 Tainted: G           O    4.14.32-kdump+ #142
[11329930304.896216] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[11329930304.896217] task: ffff9f797b3fa1c0 task.stack: ffff9f797b864000
[11329930304.896220] RIP: 0010:dev_watchdog+0x1f5/0x200
[11329930304.896222] RSP: 0018:ffff9f797b867da0 EFLAGS: 00010286
[11329930304.896223] RAX: 0000000000000039 RBX: 0000000000000000 RCX: 0000000000000002
[11329930304.896225] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffffffbe240570
[11329930304.896226] RBP: ffff9f793f9e4000 R08: 0000000000000000 R09: 0000000000000039
[11329930304.896227] R10: ffff9f797fd196b0 R11: 00000000000f7f14 R12: 0000000000000001
[11329930304.896228] R13: 0000000000000001 R14: ffff9f793f9e4000 R15: 0000000000000001
[11329930304.896231] FS:  0000000000000000(0000) GS:ffff9f797fd00000(0000) knlGS:0000000000000000
[11329930304.896233] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11329930304.896237] CR2: 00007f11c1d09db0 CR3: 000000007a20a003 CR4: 00000000003606e0
[11329930304.896237] Call Trace:
[11329930304.896244]  ? qdisc_rcu_free+0x40/0x40
[11329930304.896247]  call_timer_fn+0x2b/0x120
[11329930304.896255]  run_timer_softirq+0x1c5/0x3f0
[11329930304.896259]  __do_softirq+0x104/0x297
[11329930304.896263]  run_ksoftirqd+0x14/0x30
[11329930304.896270]  smpboot_thread_fn+0x109/0x160
[11329930304.896273]  kthread+0xfa/0x130
[11329930304.896276]  ? sort_range+0x20/0x20
[11329930304.896279]  ? kthread_create_on_node+0x60/0x60
[11329930304.896281]  ret_from_fork+0x35/0x40
[11329930304.896283] Code: 63 8d 60 04 00 00 eb 90 48 89 ef c6 05 ef 71 b7 00 01 e8 8f 36 fd ff 89 d9 48 89 c2 48 89 ee 48 c7 c7 a8 f1 03 be e8 e1 af ba ff <0f> 0b eb bd 0f 1f 80 00 00 00 00 48 c7 47 08 00 00 00 00 48 c7 
[11329930304.896320] ---[ end trace d7d61677992194ac ]---
[11329930304.923033] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[11329930304.923036] sched_clock: Marking unstable (10348471035636, -7116824117260473173)<-(-7116813768744506908, -42012430)

krizhanovsky commented 5 years ago

I'm not sure if it's related or an other issue, bu also got RCU warning:

[11329934117.353112] INFO: rcu_sched self-detected stall on CPU
[11329934117.353112]    1-...: (5246 ticks this GP) idle=95e/140000000000001/0 softirq=71204/71204 fqs=0
[11329934117.353112]     (t=5251 jiffies g=24109 c=24108 q=6)
[11329934117.353112] rcu_sched kthread starved for 5251 jiffies! g24109 c24108 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
[11329934117.361118] INFO: rcu_sched detected stalls on CPUs/tasks:
[11329934117.361121]    1-...: (5246 ticks this GP) idle=95e/140000000000001/0 softirq=71204/71204 fqs=0
[11329934117.361121]    (detected by 0, t=5252 jiffies, g=24109, c=24108, q=6)
[11329934117.361121] Sending NMI from CPU 0 to CPUs 1:
[11329934117.361121] NMI backtrace for cpu 1
[11329934117.361121] CPU: 1 PID: 12328 Comm: wrk Tainted: G        W  O    4.14.32-kdump+ #142
[11329934117.361121] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[11329934117.361121] task: ffff9f797bb821c0 task.stack: ffff9f797bca8000
[11329934117.361121] RIP: 0010:cfb_imageblit+0x4c6/0x520
[11329934117.361121] RSP: 0018:ffff9f797fd039f0 EFLAGS: 00000046
[11329934117.361121] RAX: 0000000000000000 RBX: ffff9f797a862d97 RCX: 0000000000000005
[11329934117.361121] RDX: ffffc3bac0cf1c88 RSI: 0000000000000000 RDI: 0000000000000001
[11329934117.361121] RBP: ffffc3bac0cf1c8c R08: ffffffffbde885c0 R09: 0000000000aaaaaa
[11329934117.361121] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001000
[11329934117.361121] R13: ffffc3bac0cf1d60 R14: 000000000000000e R15: ffffc3bac0cf1c80
[11329934117.361121] FS:  00007f7e07930700(0000) GS:ffff9f797fd00000(0000) knlGS:0000000000000000
[11329934117.361121] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11329934117.361121] CR2: 00007f7e0006b2c8 CR3: 000000002ef06002 CR4: 00000000003606e0
[11329934117.361121] Call Trace:
[11329934117.361121]  <IRQ>
[11329934117.361121]  drm_fb_helper_cfb_imageblit+0xd/0x30 [drm_kms_helper]
[11329934117.361121]  bit_putcs+0x2bc/0x4b0
[11329934117.361121]  ? drm_fb_helper_cfb_copyarea+0x11/0x30 [drm_kms_helper]
[11329934117.361121]  ? soft_cursor+0x19f/0x220
[11329934117.361121]  ? bit_clear+0x110/0x110
[11329934117.361121]  fbcon_putcs+0xf5/0x130
[11329934117.361121]  fbcon_redraw.isra.20+0xdb/0x1c0
[11329934117.361121]  fbcon_scroll+0x342/0xc60
[11329934117.361121]  con_scroll+0xce/0xe0
[11329934117.361121]  lf+0x99/0xa0
[11329934117.361121]  vt_console_print+0x2a2/0x3f0
[11329934117.361121]  console_unlock+0x3c3/0x4b0
[11329934117.361121]  vprintk_emit+0x259/0x2c0

krizhanovsky commented 5 years ago

Just saw the RCU warning again after relatively long time of inactivity after TLS functional tests execution (see timestamps in the log):

[23643.858312] tsc: Marking TSC unstable due to clocksource watchdog
[23643.868078] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[23643.872516] sched_clock: Marking unstable (23643878824366, -10745512)<-(23643918108916, -50032260)
[23643.913878] clocksource: Switched to clocksource hpet
[41201.262146] INFO: rcu_sched self-detected stall on CPU
[41208.689243] INFO: rcu_sched detected stalls on CPUs/tasks:
[41208.689243]  1-...: (2 ticks this GP) idle=90e/140000000000001/0 softirq=282036/282036 fqs=0
[41208.689243]  (detected by 0, t=9357 jiffies, g=48207, c=48206, q=8)
[41208.689243] Sending NMI from CPU 0 to CPUs 1:
[41208.689243] NMI backtrace for cpu 1
[41208.689243] CPU: 1 PID: 29397 Comm: kworker/1:0 Tainted: G        W  O    4.14.32-kdump+ #142
[41208.689243] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[41208.689243] Workqueue: events e1000_watchdog [e1000]
[41208.689243] task: ffff89358f770140 task.stack: ffff8935ca1c8000
[41208.689243] RIP: 0010:io_serial_out+0x11/0x20
[41208.689243] RSP: 0018:ffff8935ffd03d18 EFLAGS: 00000006
[41208.689243] RAX: 0000000000000000 RBX: ffffffffa8af7000 RCX: 0000000000000000
[41208.689243] RDX: 00000000000003f9 RSI: 0000000000000001 RDI: ffffffffa8af7000
[41208.689243] RBP: 0000000000000001 R08: 0000000000040002 R09: 0000000000000005
[41208.689243] R10: 0000000000000008 R11: ffffffffa8a9a8ad R12: 0000000000000005
[41208.689243] R13: ffffffffa8a9a8a0 R14: 000000000000003a R15: 0000000000000046
[41208.689243] FS:  0000000000000000(0000) GS:ffff8935ffd00000(0000) knlGS:0000000000000000
[41208.689243] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41208.689243] CR2: 00007ffccbc06038 CR3: 000000000f00a001 CR4: 00000000003606e0
[41208.689243] Call Trace:
[41208.689243]  <IRQ>
[41208.689243]  serial8250_console_write+0x13e/0x280
[41208.689243]  ? print_prefix+0xba/0x1b0
[41208.689243]  ? msg_print_text+0x9c/0x100
[41208.689243]  console_unlock+0x3c3/0x4b0
[41208.689243]  vprintk_emit+0x259/0x2c0
[41208.689243]  printk+0x4d/0x69
[41208.689243]  rcu_check_callbacks+0x4c5/0x900
[41208.689243]  ? update_wall_time+0x45b/0x720
[41208.689243]  ? tick_sched_do_timer+0x40/0x40
[41208.689243]  update_process_times+0x23/0x50
[41208.689243]  tick_sched_handle+0x30/0x50
[41208.689243]  tick_sched_timer+0x2f/0x70
[41208.689243]  __hrtimer_run_queues+0xd3/0x220
[41208.689243]  hrtimer_interrupt+0xa1/0x1e0
[41208.689243]  smp_apic_timer_interrupt+0x61/0x120
[41208.689243]  apic_timer_interrupt+0x7a/0x80
[41208.689243]  </IRQ>
[41208.689243] RIP: 0010:_raw_spin_unlock_irqrestore+0x5/0x10
[41208.689243] RSP: 0018:ffff8935ca1cbe00 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10

krizhanovsky commented 5 years ago

The TSC warning is frequently seen in Tempesta workloads. The problem is in Switched to clocksource hpet - HPET has much higher overhead than TSC (e.g. see https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1821441). Also, since https://lore.kernel.org/patchwork/patch/782295/ :

Since the clocksource watchdog will only detect broken TSC after the fact, all TSC based clocks will likely have observed non-continuous values before/when switching away from TSC.

, it seems the RCU and dev_watchdog issues might be due to the TSC issue.

There are several reasons for possible RCU stale: connected serial console plus warnings (we have a lot of them in perfomance tests), RCU kernel threads starvation (rcu_sched kthread starved for 5251 jiffies! in second log) due to high volume networking (same as for TSC). All the 2 RCU warnings are about serial console on the VM. However, since the RCU warnings appear only 1000-18000sec after the workload, the problems isn't in softirqs overloaded by Tempesta, but actually by the system timer.

krizhanovsky commented 5 years ago

Related discussion https://bugzilla.redhat.com/show_bug.cgi?id=1352992 and patch not accepted in mainline https://lore.kernel.org/patchwork/patch/595803/. The timer problem is easy to reproduce on native Debian kernel in VM: just suspend and wakeup the host system.

Kernel parameter tsc=unstable fixes the problem, moving to HPET clocksource:

root@debian:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
hpet acpi_pm 
root@debian:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
hpet

However, as was mentioned above, HPET introduced significant overhead:

     6.80%  [kernel]          [k] read_hpet
     2.55%  [kernel]          [k] syscall_return_via_sysret
     2.50%  libc-2.24.so      [.] _int_malloc
     2.38%  wrk               [.] http_parser_execute

With -no-hpet command line argument for Qemu, tsc falls to acpi_pm which doesn't impose so high overhead.

tempesta-tech / tempesta

TLS: transmit queue time out #1278