tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
616 stars 103 forks source link

Do not schedule requests to server if socket is in CLOSE_WAIT state #2111

Open EvgeniiMekhanik opened 5 months ago

EvgeniiMekhanik commented 5 months ago

Sockets that are in the CLOSE_WAIT state are waiting for an acknowledgement from the client (PC) end of the connection so that the socket can be closed completely. They will wait FOREVER, there is no timeout value.

Problem: We have two servers one upstream and one reserved. Image that upstream server shutdowns. In this case socket on our side can be in CLOSE_WAIT state for a long period. If new request come we should send it to reserved server, but now we send it to shut downing upstream server

krizhanovsky commented 3 months ago

Some tests for #2076 flack due to the issue

EvgeniiMekhanik commented 3 months ago

Please look at tempesta-test remove after 2111

enuribekov-tempesta commented 2 months ago

Please look at tempesta-test remove after 2111

@EvgeniiMekhanik @krizhanovsky

Looks like it doesn't work anymore.

--- a/sessions/test_learn.py +++ b/sessions/test_learn.py @@ -211 +211 @@ class LearnSessions(LearnSessionsBase):

-        time.sleep(1)
+        #time.sleep(1)

test_backend_fail (sessions.test_learn.LearnSessions) ... ok

--- a/sessions/test_sessions.py +++ b/sessions/test_sessions.py @@ -253 +253 @@ class StickySessionsPersistense(StickySessions):

-        time.sleep(1)
+        #time.sleep(1)

test_sessions (sessions.test_sessions.StickySessionsPersistense) ... ok

@@ -402 +402 @@ class StickySessionsFailover(StickySessions):

-        time.sleep(1)
+        #time.sleep(1)
`test_sessions (sessions.test_sessions.StickySessionsFailover) ... ok`

@@ -437 +437 @@ class StickySessionsFailover(StickySessions):

-        time.sleep(1)
+        #time.sleep(1)

test_sessions_reserved_server (sessions.test_sessions.StickySessionsFailover) ... ok

--- a/t_http_rules/test_backup_servers.py +++ b/t_http_rules/test_backup_servers.py @@ -91 +91 @@ http_chain {

-        time.sleep(1)
+        #time.sleep(1)

test_scheduler (t_http_rules.test_backup_servers.HttpRulesBackupServers) ... ok

--- a/t_reconf/test_reconf_base.py +++ b/t_reconf/test_reconf_base.py @@ -1371 +1371 @@ http_chain {{

-        time.sleep(1)
+        #time.sleep(1)

test_reconf_backup_group (t_reconf.test_reconf_base.TestProxyPassReconf) ... ok

enuribekov-tempesta commented 2 months ago

Found test issue:

FAIL: test_scheduler (t_http_rules.test_backup_servers.HttpRulesBackupServers) ... assert tempesta.stats.cl_msg_forwarded == cl_msg_forwarded, msg.format( AssertionError: Tempesta has errors in processing HTTP requests. Received: 4. Expected: 3.

Meantime Wireshark shows strictly 3 pairs of requests-responses.

So at least one existing test fails for reason different from socket issues.

enuribekov-tempesta commented 2 months ago

Investigation shows that wrong number of requests is result of incorrect Tempesta internal statistics calculation.

cat /proc/tempesta/perfstat

...
Client messages received        : 3
Client messages forwarded       : 4
...
enuribekov-tempesta commented 1 month ago

The second issue connected with expired timeout.

FAIL: test_scheduler (t_http_rules.test_backup_servers.HttpRulesBackupServers)
Traceback (most recent call last):
  File "/home/user/tempesta-test/t_http_rules/test_backup_servers.py", line 102, in test_scheduler
    client.send_request(self.request, "200")
  File "/home/user/tempesta-test/framework/deproxy_client.py", line 210, in send_request
    self.wait_for_response(timeout=timeout, strict=bool(expected_status_code))
  File "/home/user/tempesta-test/framework/deproxy_client.py", line 257, in wait_for_response
    timeout_not_exceeded != False
AssertionError: Timeout exceeded while waiting response: 5

On the wire it looks following: (2 backends, 1 connection per backend, 2 connection retries)

Handshakes

TCP 37634 → 8001 [SYN] 
TCP 8001 → 37634 [SYN, ACK]
TCP 37634 → 8001 [ACK] 
TCP 38656 → 8000 [SYN] 
TCP 8000 → 38656 [SYN, ACK]
TCP 38656 → 8000 [ACK] 

Request/response

HTTP GET / HTTP/1.1  
TCP 8000 → 38656 [ACK] 
HTTP HTTP/1.1 200 OK  
TCP 38656 → 8000 [ACK]

Primary backend shutdown

TCP 8000 → 38656 [FIN, ACK]
TCP 38656 → 8000 [FIN, ACK]
TCP 8000 → 38656 [ACK] 

Endless attempts to reconnects to primary backend.

TCP 38658 → 8000 [SYN] 
TCP 8000 → 38658 [RST, ACK]
TCP 38660 → 8000 [SYN] 
TCP 8000 → 38660 [RST, ACK]
TCP 38662 → 8000 [SYN] 
TCP 8000 → 38662 [RST, ACK]
TCP 38664 → 8000 [SYN] 
TCP 8000 → 38664 [RST, ACK]
TCP 38666 → 8000 [SYN] 
TCP 8000 → 38666 [RST, ACK]
TCP 38668 → 8000 [SYN] 
TCP 8000 → 38668 [RST, ACK]
TCP 38670 → 8000 [SYN] 
TCP 8000 → 38670 [RST, ACK]
TCP 38672 → 8000 [SYN] 
TCP 8000 → 38672 [RST, ACK]

Timeout expired, test finished, secondary backend shutdown.

TCP 8001 → 37634 [FIN, ACK]
TCP 37634 → 8001 [FIN, ACK]
TCP 8001 → 37634 [ACK] 

Last attempts to reconnect while Tempesta is up

TCP 38674 → 8000 [SYN] 
TCP 8000 → 38674 [RST, ACK]
TCP 37656 → 8001 [SYN] 
TCP 8001 → 37656 [RST, ACK]
enuribekov-tempesta commented 1 month ago

server_connect_retries working incorrect counting retries higher than limit.