tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
613 stars 103 forks source link

Live lock on work queue #763

Closed krizhanovsky closed 7 years ago

krizhanovsky commented 7 years ago

Currently there is known issue with possible live lock on the work queue: https://github.com/tempesta-tech/tempesta/blob/master/tempesta_fw/sock.c#L1195 and https://github.com/tempesta-tech/tempesta/blob/ak-692/tempesta_fw/work_queue.c#L122 . The issue can lead to silent system hangs or these messages can be shown in dmesg:

    [tempesta] Warning: Socket work queue overrun: [1]
krizhanovsky commented 7 years ago

The change https://github.com/tempesta-tech/linux-4.8.15-tfw/commit/608c0a2fb8e23bc66e6a31ab52813b798c50153c#diff-241905eac9e9902dfa7b0d04fa62afdfL1686 was wrong and must be reverted since it leads to following oops:

    [ 6626.873104] BUG: spinlock recursion on CPU#0, ksoftirqd/0/3
    [ 6626.879120]  lock: 0xffff8e9f0a2880c8, .magic: dead4ead, .owner: ksoftirqd/0/3, .owner_cpu: 0
    [ 6626.882196] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G           O    4.9.35 #9
    [ 6626.885557] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
    [ 6626.886176]  ffffab734032f360 ffffffffbb8e6565 ffff8e9f1a826788 727f4b241f277d49
    [ 6626.886176]  ffff8e9f1a826240 ffff8e9f0a2880c8 ffffab734032f390 ffffffffbb526783
    [ 6626.886176]  ffff8e9f0a2880c8 0000000000000005 ffff8e9f0a2880c8 ffff8e9ef5977020
    [ 6626.886176] Call Trace:
    [ 6626.886176]  [<ffffffffbb8e6565>] dump_stack+0x60/0x9b
    [ 6626.886176]  [<ffffffffbb526783>] spin_dump+0x93/0x140
    [ 6626.886176]  [<ffffffffbb5269db>] do_raw_spin_lock+0x12b/0x190
    [ 6626.886176]  [<ffffffffbbd8a059>] _raw_spin_lock+0x9/0x10
    [ 6626.886176]  [<ffffffffc081a48a>] __ss_close+0xaa/0x160 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc07e0f9e>] tfw_http_resp_fwd+0x20e/0x570 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc07e13b9>] tfw_http_send_resp.constprop.18+0xb9/0x1a0 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc07e1566>] tfw_http_send_500.isra.15+0xc6/0xe0 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc07e1a60>] tfw_http_req_zap_error+0x140/0x1f0 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc07e4018>] tfw_http_req_cache_cb+0x208/0x270 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc07e3e10>] ? tfw_http_conn_repair+0xbd0/0xbd0 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc07d7e83>] tfw_cache_process+0x143/0x480 [tempesta_fw]
    [ 6626.886176]  [<ffffffffbb566757>] ? __getnstimeofday64+0x67/0x190
    [ 6626.886176]  [<ffffffffc07e2ba8>] tfw_http_msg_process+0x438/0xad0 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc07dd1df>] __gfsm_fsm_exec+0xdf/0x190 [tempesta_fw]
    [ 6626.886176]  [<ffffffffbbc577d6>] ? tcp_rtt_estimator+0x1d6/0x300
    [ 6626.886176]  [<ffffffffbb559739>] ? __internal_add_timer+0x49/0x130
    [ 6626.886176]  [<ffffffffc07dd874>] tfw_gfsm_dispatch+0x24/0x50 [tempesta_fw]
    [ 6626.886176]  [<ffffffffbb55a846>] ? mod_timer+0x1f6/0x6d0
    [ 6626.886176]  [<ffffffffc07dbf28>] tfw_connection_recv+0x18/0x20 [tempesta_fw]
    [ 6626.886176]  [<ffffffffc0818e42>] ss_tcp_process_data+0x242/0x740 [tempesta_fw]
    [ 6626.886176]  [<ffffffffbbc6b77b>] ? tcp_send_delayed_ack+0xcb/0x150
    [ 6626.886176]  [<ffffffffc0819435>] ss_tcp_data_ready+0x65/0x100 [tempesta_fw]
    [ 6626.886176]  [<ffffffffbbc5fc6f>] tcp_rcv_established+0x42f/0x9d0
    [ 6626.886176]  [<ffffffffbbbea44b>] ? sk_filter_trim_cap+0x3b/0x360
    [ 6626.886176]  [<ffffffffbbc72b75>] tcp_v4_do_rcv+0x125/0x350
    [ 6626.886176]  [<ffffffffbbc74f3c>] tcp_v4_rcv+0x9cc/0xe10
    [ 6626.886176]  [<ffffffffc07c0300>] ? tdb_rec_get+0x40/0xa0 [tempesta_db]
    [ 6626.886176]  [<ffffffffbbc2eefb>] ip_local_deliver_finish+0xbb/0x3a0
    [ 6626.886176]  [<ffffffffbbc2f538>] ip_local_deliver+0x88/0x160
    [ 6626.886176]  [<ffffffffbbc74497>] ? tcp_v4_early_demux+0x1b7/0x290
    [ 6626.886176]  [<ffffffffbbc1fa7b>] ? nf_iterate+0x6b/0x110
    [ 6626.886176]  [<ffffffffbbc2e79a>] ip_rcv_finish+0x1ea/0x890
    [ 6626.886176]  [<ffffffffbbc1fb94>] ? nf_hook_slow+0x74/0xe0
    [ 6626.886176]  [<ffffffffbbc2f8f5>] ip_rcv+0x2e5/0x540
    [ 6626.886176]  [<ffffffffbbc2e5b0>] ? inet_del_offload+0x40/0x40
    [ 6626.886176]  [<ffffffffbbc2f610>] ? ip_local_deliver+0x160/0x160
    [ 6626.886176]  [<ffffffffbbbc1dda>] __netif_receive_skb_core+0x89a/0x11f0
    [ 6626.886176]  [<ffffffffbbd8a100>] ? _raw_spin_trylock_bh+0x30/0x40
    [ 6626.886176]  [<ffffffffbbbc274f>] __netif_receive_skb+0x1f/0xe0
    [ 6626.886176]  [<ffffffffbbbc28c6>] process_backlog+0xb6/0x2e0
    [ 6626.886176]  [<ffffffffbbd8a149>] ? _raw_spin_unlock_irq+0x9/0x10
    [ 6626.886176]  [<ffffffffbbbc36ac>] net_rx_action+0x2cc/0x5f0
    [ 6626.886176]  [<ffffffffbb4b3af9>] __do_softirq+0x139/0x420
    [ 6626.886176]  [<ffffffffbb4b3df7>] run_ksoftirqd+0x17/0x30
    [ 6626.886176]  [<ffffffffbb4e47fa>] smpboot_thread_fn+0x16a/0x310
    [ 6626.886176]  [<ffffffffbb4e4690>] ? sort_range+0x20/0x20
    [ 6626.886176]  [<ffffffffbb4deb5f>] kthread+0x12f/0x190
    [ 6626.886176]  [<ffffffffbb4dea30>] ? kthread_create_on_node+0x60/0x60
    [ 6626.886176]  [<ffffffffbbd8a602>] ret_from_fork+0x22/0x30

The problem can be fixed by introducing two backlogs to the work queue. The first one is a simple linked list: if we have to do synchronous work on current CPU, then we can just place the work in a non-synchronozed list access by the CPU only. When the work queue tasklet is executed, it firstly works on the backlog for local CPU works. The second backlog, processed after the ring buffer, is a slow path queue accessible for insertion by all the CPUs: if a CPU has to pass a work to another CPU, but there is no space in the ring buffer, then it can just place the work in a linked list. There is a policy that we should avoid synchronous operations, so it shouldn't hurt performance.