Closed krizhanovsky closed 7 years ago
Got somewhat different crash in APM after running
# ulimit -n 32768; ./wrk -t 16 -c 4096 -d 3600s http://centos:80/
I used Linux patch https://github.com/tempesta-tech/linux-4.8.15-tfw/commit/9cea1ec0156145217ac6320c40189b40b8160780 and #692 -mitigating patch
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3191,6 +3191,12 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
if (!fully_acked)
break;
+ if (!skb->next) {
+ pr_err("AK_DBG peer=%x:%x state=%u sk_refcnt=%d flags=%lx\n",
+ sk->sk_daddr, sk->sk_dport, sk->sk_state,
+ atomic_read(&sk->sk_refcnt), sk->sk_flags);
+ continue;
+ }
tcp_unlink_write_queue(skb, sk);
sk_wmem_free_skb(sk, skb);
if (unlikely(skb == tp->retransmit_skb_hint))
Tempesta master commit b60cc3d884f9d838362c6ce8bb64503d06fa9b22) is used. The crash trace is
[ 4307.593595] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 44s! [swapper/1:0]
[ 4307.593595] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_vs nf_conntrack dm_mirror dm_region_hash dm_log dm_mod ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw xfs libcrc32c ppdev gf128mul ablk_helper cryptd input_leds led_class pcspkr parport_pc parport i2c_piix4 i2c_core acpi_cpufreq uinput ip_tables btrfs xor raid6_pq ata_generic pata_acpi crc32c_intel serio_raw e1000 ata_piix floppy ipv6 crc_ccitt autofs4
[ 4307.593595] irq event stamp: 8232141
[ 4307.593595] hardirqs last enabled at (8232140): [<ffffffff81651d92>] restore_regs_and_iret+0x0/0x1d
[ 4307.593595] hardirqs last disabled at (8232141): [<ffffffff81652ce4>] apic_timer_interrupt+0x84/0x90
[ 4307.593595] softirqs last enabled at (8229960): [<ffffffff8106f631>] _local_bh_enable+0x21/0x50
[ 4307.593595] softirqs last disabled at (8229961): [<ffffffff81070828>] irq_exit+0x98/0xb0
[ 4307.593595] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 4.8.15-tfw #45
[ 4307.593595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[ 4307.593595] task: ffff88005c8e1700 task.stack: ffff88005c8e8000
[ 4307.593595] RIP: 0010:[<ffffffffa068fb82>] [<ffffffffa068fb82>] tfw_apm_prcntl_tmfn+0x7a2/0x970 [tempesta_fw]
[ 4307.593595] RSP: 0018:ffff88007fd03dc0 EFLAGS: 00000216
[ 4307.593595] RAX: fc25fb3500000004 RBX: ffff88005946d600 RCX: 000000009b2945f3
[ 4307.593595] RDX: 000000000000ff52 RSI: 000000000000ff52 RDI: 00000000fc9dfb35
[ 4307.593595] RBP: ffff88007fd03e30 R08: 000000009b2945f4 R09: 00000000fc25fb35
[ 4307.593595] R10: 000000000000fb35 R11: 0000000000000001 R12: ffff88005a3cbd90
[ 4307.593595] R13: 0000000000000000 R14: 0000000000000015 R15: ffffe8ffffd04930
[ 4307.593595] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[ 4307.593595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4307.593595] CR2: 00007f99b61de000 CR3: 00000000220d2000 CR4: 00000000003406e0
[ 4307.593595] Stack:
[ 4307.593595] ffff88007fd03e30 ffff88005946d000 000000015c8e1700 ffff88005946d180
[ 4307.593595] ffffffff810be849 ffff88005c8e1700 0000000000000246 0000000000000101
[ 4307.593595] ffff88007fd03e20 ffff88005946d110 ffff88005946d110 0000000000000101
[ 4307.593595] Call Trace:
[ 4307.593595] <IRQ>
[ 4307.593595] [<ffffffff810be849>] ? __lock_is_held+0x49/0x70
[ 4307.593595] [<ffffffffa068f3e0>] ? tfw_apm_stats+0x90/0x90 [tempesta_fw]
[ 4307.593595] [<ffffffff810e677e>] call_timer_fn+0x9e/0x310
[ 4307.593595] [<ffffffff810e66e5>] ? call_timer_fn+0x5/0x310
[ 4307.593595] [<ffffffffa068f3e0>] ? tfw_apm_stats+0x90/0x90 [tempesta_fw]
[ 4307.593595] [<ffffffff810e76b9>] run_timer_softirq+0x1c9/0x620
[ 4307.593595] [<ffffffff810f07a6>] ? ktime_get+0x96/0x120
[ 4307.593595] [<ffffffff810f7454>] ? clockevents_program_event+0x44/0x110
[ 4307.593595] [<ffffffff81653d68>] __do_softirq+0xc8/0x473
[ 4307.593595] [<ffffffff81070828>] irq_exit+0x98/0xb0
[ 4307.593595] [<ffffffff81653a22>] smp_apic_timer_interrupt+0x42/0x50
[ 4307.593595] [<ffffffff81652ce9>] apic_timer_interrupt+0x89/0x90
[ 4307.593595] <EOI>
[ 4307.593595] [<ffffffff81028085>] ? default_idle+0x25/0x180
[ 4307.593595] [<ffffffff81028083>] ? default_idle+0x23/0x180
[ 4307.593595] [<ffffffff810289ef>] arch_cpu_idle+0xf/0x20
[ 4307.593595] [<ffffffff810b5563>] default_idle_call+0x33/0x50
[ 4307.593595] [<ffffffff810b57ba>] cpu_startup_entry+0x23a/0x420
[ 4307.593595] [<ffffffff8104311b>] start_secondary+0x10b/0x130
[ 4307.593595] Code: 05 da cd e0 48 8b 45 98 f0 ff 80 00 01 00 00 4c 89 e7 e8 f2 16 fc e0 e9 e1 fa ff ff 48 89 c7 8d 48 01 48 c1 ef 20 49 89 f9 eb 03 <44> 89 c1 bf 0f 00 00 00 44 8d 41 01 d3 e7 44 01 cf 44 0f b7 d7
[ 4307.593595] Kernel panic - not syncing: softlockup: hung tasks
[ 4307.593595] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O L 4.8.15-tfw #45
[ 4307.593595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[ 4307.593595] 0000000000000000 ffff88007fd03b78 ffffffff8135ecbe ffff88005c8e1700
[ 4307.593595] ffffffff819dcb0e ffff88007fd03bf0 ffffffff8117783e 0000000000000008
[ 4307.593595] ffff88007fd03c00 ffff88007fd03ba0 0000000000000001 0000000000000000
[ 4307.593595] Call Trace:
[ 4307.593595] <IRQ> [<ffffffff8135ecbe>] dump_stack+0x67/0x99
[ 4307.593595] [<ffffffff8117783e>] panic+0xda/0x220
[ 4307.593595] [<ffffffff81131100>] watchdog_timer_fn+0x1d0/0x1d0
[ 4307.593595] [<ffffffff81130f30>] ? watchdog+0x40/0x40
[ 4307.593595] [<ffffffff810e9ba0>] __hrtimer_run_queues+0x100/0x500
[ 4307.593595] [<ffffffff810ea38b>] hrtimer_interrupt+0xab/0x1b0
[ 4307.593595] [<ffffffff81044ae5>] local_apic_timer_interrupt+0x35/0x60
[ 4307.593595] [<ffffffff81653a1d>] smp_apic_timer_interrupt+0x3d/0x50
[ 4307.593595] [<ffffffff81652ce9>] apic_timer_interrupt+0x89/0x90
[ 4307.593595] [<ffffffffa068fb82>] ? tfw_apm_prcntl_tmfn+0x7a2/0x970 [tempesta_fw]
[ 4307.593595] [<ffffffff810be849>] ? __lock_is_held+0x49/0x70
[ 4307.593595] [<ffffffffa068f3e0>] ? tfw_apm_stats+0x90/0x90 [tempesta_fw]
[ 4307.593595] [<ffffffff810e677e>] call_timer_fn+0x9e/0x310
[ 4307.593595] [<ffffffff810e66e5>] ? call_timer_fn+0x5/0x310
[ 4307.593595] [<ffffffffa068f3e0>] ? tfw_apm_stats+0x90/0x90 [tempesta_fw]
[ 4307.593595] [<ffffffff810e76b9>] run_timer_softirq+0x1c9/0x620
[ 4307.593595] [<ffffffff810f07a6>] ? ktime_get+0x96/0x120
[ 4307.593595] [<ffffffff810f7454>] ? clockevents_program_event+0x44/0x110
[ 4307.593595] [<ffffffff81653d68>] __do_softirq+0xc8/0x473
[ 4307.593595] [<ffffffff81070828>] irq_exit+0x98/0xb0
[ 4307.593595] [<ffffffff81653a22>] smp_apic_timer_interrupt+0x42/0x50
[ 4307.593595] [<ffffffff81652ce9>] apic_timer_interrupt+0x89/0x90
[ 4307.593595] <EOI> [<ffffffff81028085>] ? default_idle+0x25/0x180
[ 4307.593595] [<ffffffff81028083>] ? default_idle+0x23/0x180
[ 4307.593595] [<ffffffff810289ef>] arch_cpu_idle+0xf/0x20
[ 4307.593595] [<ffffffff810b5563>] default_idle_call+0x33/0x50
[ 4307.593595] [<ffffffff810b57ba>] cpu_startup_entry+0x23a/0x420
[ 4307.593595] [<ffffffff8104311b>] start_secondary+0x10b/0x130
The oops corresponds to following code in APM:
# nm tempesta_fw.ko|grep tfw_apm_prcntl_tmfn
00000000000013e0 t tfw_apm_prcntl_tmfn
# addr2line -e tempesta_fw.ko -i `perl -e 'printf("%x", 0x13e0+0x7a2)'`
/root/tempesta/tempesta_fw/apm.c:180
/root/tempesta/tempesta_fw/apm.c:353
/root/tempesta/tempesta_fw/apm.c:924
APM has been reworked in #753. Bugs that caused the crashes were eliminated.
From https://github.com/tempesta-tech/tempesta/issues/692#issuecomment-308987475 . The problem seems happen at https://github.com/tempesta-tech/tempesta/blob/master/tempesta_fw/apm.c#L223 , i.e.
cnt
isn't incremented in the loop above.I got the oops at the below using workload
./wrk -t 16 -c 4096 -d 3600s http://centos:80/
in VM which also runs Apache HTTPD with connection reset each 100 requests.