tempesta-tech / tempesta-test

Test suite for Tempesta FW
11 stars 5 forks source link

Base client hidden exceptions processing #688

Open KonsKo opened 1 month ago

KonsKo commented 1 month ago

Current implementation of the framework.client.Client has a hidden error(perhaps this is architecture feature). But it is not easy to debug with such realization.

Let's look at a situation: we have a test that work with a client, for my case it is ratecheck (t_frang.test_connection_rate_burst.FrangTcpVsBoth.test_burst).

Now, I already knew, that the client did not exist on my machine (was not installed).

But when I run the test with low details first times I received something like:

test_burst (t_frang.test_connection_rate_burst.FrangTcpVsBoth.test_burst) ... ERROR

======================================================================
ERROR: test_burst (t_frang.test_connection_rate_burst.FrangTcpVsBoth.test_burst)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/Tests_on_bare_metal/tempesta-test/t_frang/test_connection_rate_burst.py", line 262, in test_burst
    tls_reset_conn_n, tls_total_conn_n = parse_out(client)
                                         ^^^^^^^^^^^^^^^^^
  File "/home/jenkins/workspace/Tests_on_bare_metal/tempesta-test/t_frang/test_connection_rate_burst.py", line 26, in parse_out
    reset_conn_n = int(rx.findall(client.stdout.decode())[0])
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

----------------------------------------------------------------------
Ran 1 test in 1.239s

FAILED (errors=1) 

The real problem is hidden.

I can set more detailed logs and I may see an error, but the error displayed in the middle of log output and I have to search for it, and it is just a message (it is colored now, but we added logs processing recently).

Real problem here: https://github.com/tempesta-tech/tempesta-test/blob/fc5e2e38a7d4be292f15b1b3c3b5fcd015dd7023/framework/client.py#L8

def _run_client(client, resq: multiprocessing.Queue):
    try:
        res = remote.client.run_cmd(client.cmd, timeout=(client.duration + 5))
    except remote.CmdError as e:
        res = (e.stdout, e.stderr)
        client.returncode = e.returncode
    resq.put(res)
    tf_cfg.dbg(3, "\tClient exit")

run_cmd raises an exception if an error was caught, but here it become hidden.

We need to explicit (re)raise an exception in such cases to provide an user correct information.

I have not updated the code right away, because the client works with multiprocessing and we need to think about the best way how to implement it, otherwise side effects may occur.

KonsKo commented 1 month ago

One more similar case:

https://github.com/tempesta-tech/tempesta-test/blob/f53c85a3339cf2d3b129054e1fd5e31d26ec9d41/helpers/stateful.py#L63

If an exception is caught, it is being hidden (all information is provided, but the situation is not considered as an error)

Example (there is an exception but test is considered as successfull):

test_load_distribution (t_sched.test_h2_ratio_weight.RatioH2.test_load_distribution) ... 24-09-27 21:15:35.321801 CRITICAL  | Exception in stopping         tf_cfg.py:245
                                  process: ("Cmd                                
                                  '/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta/scripts/te              
                                  mpesta.sh --stop' got timed out               
                                  (timeout value is 30)",                       
                                  b'Stopping                                    
                                  Tempesta...\nnet.tempesta.state               
                                  = stop\n...unload Tempesta                    
                                  modules\nUn-loading Tempesta                  
                                  kernel modules...\ndone\n',                   
                                  b'Unit irqbalance.service could               
                                  not be found.\n', -9), type:                  
                                  <class                                        
                                  'helpers.remote.CmdError'>,                   
                                  traceback:                                    
                                  Traceback (most recent call                   
                                  last):                                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/remote.py", line 106, in                   
                                  run_cmd                                       
                                      stdout, stderr =                          
                                  p.communicate(timeout=timeout)                
                                                       ^^^^^^^^^^^              
                                  ^^^^^^^^^^^^^^^^^^^                           
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 1209, in communicate                
                                      stdout, stderr =                          
                                  self._communicate(input,                      
                                  endtime, timeout)                             
                                                       ^^^^^^^^^^^              
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^               
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 2116, in _communicate               
                                      self._check_timeout(endtime,              
                                  orig_timeout, stdout, stderr)                 
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 1253, in                            
                                  _check_timeout                                
                                      raise TimeoutExpired(                     
                                  subprocess.TimeoutExpired:                    
                                  Command                                       
                                  '/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta/scripts/te              
                                  mpesta.sh --stop' timed out                   
                                  after 30 seconds                              

                                  During handling of the above                  
                                  exception, another exception                  
                                  occurred:                                     

                                  Traceback (most recent call                   
                                  last):                                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/stateful.py", line 68, in                  
                                  force_stop                                    
                                      stop_proc()                               
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/control.py", line 355, in                  
                                  stop_tempesta                                 
                                      self.node.run_cmd(cmd,                    
                                  timeout=30,                                   
                                  err_msg=(self.err_msg % "stop"))              
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/remote.py", line 114, in                   
                                  run_cmd                                       
                                      raise CmdError(                           
                                  helpers.remote.CmdError: ("Cmd                
                                  '/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta/scripts/te              
                                  mpesta.sh --stop' got timed out               
                                  (timeout value is 30)",                       
                                  b'Stopping                                    
                                  Tempesta...\nnet.tempesta.state               
                                  = stop\n...unload Tempesta                    
                                  modules\nUn-loading Tempesta                  
                                  kernel modules...\ndone\n',                   
                                  b'Unit irqbalance.service could               
                                  not be found.\n', -9)                         

ok.         **<------ Success**
KonsKo commented 1 month ago

Fails on a teardown or a cleanup should be considered as fail.

One more example:

test_wait_for_connections_negative (selftests.test_lxc_server.TestLxcServer.test_wait_for_connections_negative) ... 24-09-30 23:20:32.577133 CRITICAL  | Exception in stopping         tf_cfg.py:245
                                  process: ("Cmd 'lxc stop                      
                                  tempesta-site-stage' got timed                
                                  out (timeout value is 10)", b'',              
                                  b'', -9), type: <class                        
                                  'helpers.remote.CmdError'>,                   
                                  traceback:                                    
                                  Traceback (most recent call                   
                                  last):                                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/remote.py", line 106, in                   
                                  run_cmd                                       
                                      stdout, stderr =                          
                                  p.communicate(timeout=timeout)                
                                                       ^^^^^^^^^^^              
                                  ^^^^^^^^^^^^^^^^^^^                           
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 1209, in communicate                
                                      stdout, stderr =                          
                                  self._communicate(input,                      
                                  endtime, timeout)                             
                                                       ^^^^^^^^^^^              
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^               
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 2116, in _communicate               
                                      self._check_timeout(endtime,              
                                  orig_timeout, stdout, stderr)                 
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 1253, in                            
                                  _check_timeout                                
                                      raise TimeoutExpired(                     
                                  subprocess.TimeoutExpired:                    
                                  Command 'lxc stop                             
                                  tempesta-site-stage' timed out                
                                  after 10 seconds                              

                                  During handling of the above                  
                                  exception, another exception                  
                                  occurred:                                     

                                  Traceback (most recent call                   
                                  last):                                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/stateful.py", line 68, in                  
                                  force_stop                                    
                                      stop_proc()                               
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/frame              
                                  work/lxc_server.py", line 153,                
                                  in stop_server                                
                                      self.node.run_cmd(self._cons              
                                  truct_cmd(["stop",                            
                                  self.container_name]))                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/remote.py", line 114, in                   
                                  run_cmd                                       
                                      raise CmdError(                           
                                  helpers.remote.CmdError: ("Cmd                
                                  'lxc stop tempesta-site-stage'                
                                  got timed out (timeout value is               
                                  10)", b'', b'', -9)                           

ok         **<------ Success**