tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
618 stars 103 forks source link

BUG on test_failovering_with_unlim_ka_requests run #2220

Closed ttaym closed 1 month ago

ttaym commented 2 months ago

eelnrn# Testing

When run tempesta-test test suit, i have encountered error on t_regress.test_stress_failovering.TestStressFailoveringWithUnlimKAReq.test_failovering_with_unlim_ka_requests test.

======================================================================
ERROR: test_failovering_with_unlim_ka_requests (t_regress.test_stress_failovering.TestStressFailoveringWithUnlimKAReq)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/tempesta-test/framework/tester.py", line 410, in cleanup_check_dmesg
    raise Exception(f"{err} happened during test on Tempesta")
Exception: BUG happened during test on Tempesta

======================================================================

On rerun bug reproducing from time to time.

Dmesg in text and with debug traces attached:

[13432.351299] watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [swapper/9:0]
[13432.351644] [tempesta fw] Warning: Close TCP socket w/o sending alert to the peer: 127.0.0.1
[13432.352154] Modules linked in: tempesta_fw(OE) tempesta_db(OE) tempesta_tls(OE) tempesta_lib(OE) xt_mark tcp_diag inet_diag veth tls sha256_ssse3 nft_masq nvme_tcp ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat iptable_filter vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock xt_addrtype nft_compat br_netfilter ipmi_devintf ipmi_msghandler nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nvme_fabrics nf_tables nvme_core nfnetlink overlay isofs binfmt_misc nls_iso8859_1 kvm_amd ccp kvm input_leds joydev serio_raw mac_hid qemu_fw_cfg sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass msr efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul crc32_pclmul
[13438.725814]  ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper psmouse i2c_piix4 pata_acpi floppy [last unloaded: tempesta_lib]
[13438.732686] CPU: 9 PID: 0 Comm: swapper/9 Kdump: loaded Tainted: G           OEL    5.10.35+ #1
[13438.733334] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[13438.734179] RIP: 0010:native_safe_halt+0xe/0x10
[13438.734547] Code: 39 ff ff ff 4c 89 ee 48 c7 c7 e0 c7 65 ae e8 89 c6 8a ff e9 01 ff ff ff cc cc cc cc e9 07 00 00 00 0f 00 2d 36 c5 5f 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 26 c5 5f 00 f4 c3 cc cc 0f 1f 44 00
[13438.735878] RSP: 0018:ffffa5a8c02acc50 EFLAGS: 00000202
[13438.736288] RAX: 0000000000000003 RBX: 0000000000000246 RCX: 0000000000000008
[13438.736828] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff97c6e2968088
[13438.737366] RBP: ffffa5a8c02acc60 R08: ffff97c92eaacf00 R09: 0000000000000001
[13438.737904] R10: 0000000000000008 R11: 0000000000000001 R12: ffff97c6e2968088
[13438.738449] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000100
[13438.738987] FS:  0000000000000000(0000) GS:ffff97c92ea80000(0000) knlGS:0000000000000000
[13438.739595] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13438.740039] CR2: 000072ea36b36000 CR3: 00000001357b0002 CR4: 0000000000770ee0
[13438.740581] PKRU: 55555554
[13438.740822] Call Trace:
[13438.741046]  <IRQ>
[13438.741244]  ? kvm_wait+0x53/0x60
[13438.741566]  __pv_queued_spin_lock_slowpath+0x2b3/0x320
[13438.741985]  _raw_spin_lock+0x1e/0x30
[13438.742296]  tcp_v4_rcv+0xc48/0xe20
[13438.742595]  ip_protocol_deliver_rcu+0x44/0x230
[13438.742961]  ip_local_deliver_finish+0x48/0x60
[13438.743320]  ip_local_deliver+0xf8/0x110
[13438.743648]  ? ip_protocol_deliver_rcu+0x230/0x230
[13438.744031]  ip_rcv_finish+0x87/0xa0
[13438.744332]  ip_rcv+0xce/0xe0
[13438.744595]  ? ip_rcv_finish_core.constprop.0+0x470/0x470
[13438.745020]  __netif_receive_skb_one_core+0x86/0xa0
[13438.745410]  __netif_receive_skb+0x18/0x60
[13438.745753]  process_backlog+0x9e/0x170
[13438.746078]  net_rx_action+0x13b/0x430
[13438.746393]  ? lapic_next_deadline+0x2c/0x40
[13438.746743]  ? clockevents_program_event+0xa9/0x120
[13438.747132]  __do_softirq+0xe3/0x340
[13438.747437]  asm_call_irq_on_stack+0x12/0x20
[13438.747785]  </IRQ>
[13438.747987]  do_softirq_own_stack+0x3d/0x50
[13438.748330]  __irq_exit_rcu+0xd6/0x120
[13438.748646]  irq_exit_rcu+0xe/0x10
[13438.748936]  sysvec_call_function_single+0x3d/0x90
[13438.749319]  asm_sysvec_call_function_single+0x12/0x20
[13438.749727] RIP: 0010:native_safe_halt+0xe/0x10
[13438.750099] Code: 39 ff ff ff 4c 89 ee 48 c7 c7 e0 c7 65 ae e8 89 c6 8a ff e9 01 ff ff ff cc cc cc cc e9 07 00 00 00 0f 00 2d 36 c5 5f 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 26 c5 5f 00 f4 c3 cc cc 0f 1f 44 00
[13438.751603] RSP: 0018:ffffa5a8c00cfe88 EFLAGS: 00000202
[13438.752130] RAX: ffffffffad40e510 RBX: 0000000000000009 RCX: ffff97c92eaacf80
[13438.752742] RDX: 000000000290cc36 RSI: ffffa5a8c00cfe50 RDI: 00000c327e0a499c
[13438.753334] RBP: ffffa5a8c00cfe90 R08: 0000000000000001 R09: 00000c3282ecc285
[13438.753999] R10: 0000000000000020 R11: 000000000002c200 R12: ffff97c620348000
[13438.754567] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[13438.755134]  ? __sched_text_end+0x4/0x4
[13438.755458]  ? default_idle+0xe/0x20
[13438.755759]  arch_cpu_idle+0x15/0x20
[13438.756059]  default_idle_call+0x3d/0xc0
[13438.756383]  do_idle+0x21a/0x2a0
[13438.756660]  cpu_startup_entry+0x20/0x30
[13438.756983]  start_secondary+0x145/0x1b0
[13438.757307]  secondary_startup_64_no_verify+0xc2/0xcb
[13438.790394] [tdb] Close table 'client0.tdb'
[13438.795624] [tdb] Close table 'sessions0.tdb'
[13438.814252] [tdb] Close table 'filter0.tdb'
[13438.814629] [tempesta fw] modules are stopped
[13438.826245] [tempesta fw] exiting...
[13438.855964] [tdb] Shutdown Tempesta DB
[13439.481467] End test:   t_regress.test_stress_failovering.TestStressFailoveringWithUnlimKAReq.test_failovering_with_unlim_ka_requests

dmesg.txt

Kernel config:

config-5.10.35+.txt

krizhanovsky commented 2 months ago

@ttaym prepared a VM, which reproduces the problem, so start from the VM