tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
613 stars 103 forks source link

BUG in sock.c #2178

Closed EvgeniiMekhanik closed 1 month ago

EvgeniiMekhanik commented 1 month ago

11483.898309] [tempesta fw] Warning: frang: http_resp_code_block limit exceeded for 127.0.0.2: 2 (lim=1) [11483.900427] [tempesta fw] Warning: frang: http_resp_code_block limit exceeded for 127.0.0.2: 3 (lim=1) [11483.900431] [tempesta fw] Warning: response blocked: filtered out: 127.0.0.2 [11483.901084] [tempesta fw] Warning: response blocked: filtered out: 127.0.0.2 [11483.902606] [tempesta tls] Warning: Cannot send TLS alert 0:1, -9 [11483.903024] [tempesta fw] Warning: Close TCP socket w/o sending alert to the peer: 127.0.0.2 [11483.903667] ------------[ cut here ]------------ [11483.903994] kernel BUG at /home/evgeny/workdir/tempesta/fw/sock.c:517! [11483.904472] invalid opcode: 0000 [#1] SMP NOPTI [11483.904843] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.10.35+ #298 [11483.905717] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [11483.906479] RIP: 0010:ss_send+0x2c4/0x320 [tempesta_fw]

I catch this BUG during testing https://github.com/tempesta-tech/tempesta/pull/2139, but it seems that it is not problem in this branch. Test from https://github.com/tempesta-tech/tempesta-test/pull/598/commits/1f7290ebab8fcfc7701583bad7c90706bb9345ee http_resp_code_block_2 (reproduced only once)

EvgeniiMekhanik commented 1 month ago

I reproduce this BUG on current master. Just run this test in the loop

EvgeniiMekhanik commented 1 month ago

It seems we have two problems here:

  1. When we process response tfw_http_resp_process and catch error we call tfw_http_req_block we close client connection in this function. We can also receive FIN from the client during calling tfw_http_req_block and totally destroy connection. If we processing another response for this connection on other CPU we can catch a lot of bugs in different places.
  2. Another problem that after calling ss_conn_drop_guard_exit we call tfw_classify_conn_close immediatly even if connection still have reference counter, so if we call some frang functions for this connection on other cpu we catch a bug