TL/DR. Patch wrk to generate constant load or replace it with another load generator. Probably limiting cpu usage by wrk will make the tests very stable.
A lot of problems occurred during stabilising so called 'stress' tests. Actually naming this tests stress was a big mistake, since they are not supposed to test behaviour in stress conditions, i.e. under overload. The tests are intended to validate Tempesta behaviour under intensive but not critical load. Real stress tests are also required, but this is absolutely different type of tests.
We had faced the issue twice. First was on stabilising all 'stress' tests: due to known issues in Tempesta a lot of 502 (tempesta-tech/tempesta#940) and 500 (tempesta-tech/tempesta#1003) responses. The generated rate was too high, so does the error rate and all asserts has failed.
Second time we faced it during work on #66. But overload now showed a different type of a problem. Tests from #66 evaluates fairness of the load balancing. If environment is too concurrent and test bed has no enough resources to handle all load generators, Tempesta and backend servers, then results got screwed. E.g. some tests add programmable delay before every response for some backends to simulate slow server, but 'slow' servers get more cpu time than 'fast' ones, their latency become lower and scheduler is very likely to calculate a better weight for such servers. And tests fails, because load distributed in invalid way: slower servers got more requests than faster ones.
I've tried to slow down the traffic generator and force it to produce less requests by the following patch:
index 3e19dfc..324bb5c 100644
--- a/wrk/results.lua
+++ b/wrk/results.lua
@@ -1,4 +1,8 @@
+function delay()
+ return 500
+end
+
local threads = {}
local tid = 1
This delay() function in wrk script sets the number of milliseconds to delay sending the next request. The value of 500 reduced frame rate from 6-7krps to 3-4krps and overall system load reduced from 95-99% to ~50%. Such 'rate limitation' fixed all the instability issues for me. I suppose that setting constant request rate for all functional 'stress' tests will fix their instability problems.
Why patch delay() function? Wrk is the most suitable tool for our tests, thanks to Lua scripting. Pipelining, header modifications (sticky cookies), response codes statistics and other things are heavily used in our tests. Not all load generators have the same level of flexibility. But Wrk can't preserve constant request rate. Wrk2 has a constant rate option, but it constantly crashes with 'division by 0' errors on our lua scripts. This delay() patch is just a dirty patch to understand what is happening and how to fix it better. A more sophisticated solution is required, delay() value is correlated with number of concurrent connections.
What should be done to make work load tests strict and stable:
Use rather big amount of concurrent connections. e.g. 1000;
Slow down traffic generator to keep load of the each server used in functional tests less than 60-80%. It can be done by patching wrk, using relatively slow server for it, or restricting maximum cpu usage;
Enforce bigger test duration, if it's required for the test. Or skip the test. (later was implemented for some tests). E.g. ratio predict scheduler use information about server performance in past to configure load balancer.
TL/DR. Patch wrk to generate constant load or replace it with another load generator. Probably limiting cpu usage by wrk will make the tests very stable.
A lot of problems occurred during stabilising so called 'stress' tests. Actually naming this tests
stress
was a big mistake, since they are not supposed to test behaviour in stress conditions, i.e. under overload. The tests are intended to validate Tempesta behaviour under intensive but not critical load. Real stress tests are also required, but this is absolutely different type of tests.We had faced the issue twice. First was on stabilising all 'stress' tests: due to known issues in Tempesta a lot of 502 (tempesta-tech/tempesta#940) and 500 (tempesta-tech/tempesta#1003) responses. The generated rate was too high, so does the error rate and all asserts has failed.
Second time we faced it during work on #66. But overload now showed a different type of a problem. Tests from #66 evaluates fairness of the load balancing. If environment is too concurrent and test bed has no enough resources to handle all load generators, Tempesta and backend servers, then results got screwed. E.g. some tests add programmable delay before every response for some backends to simulate slow server, but 'slow' servers get more cpu time than 'fast' ones, their latency become lower and scheduler is very likely to calculate a better weight for such servers. And tests fails, because load distributed in invalid way: slower servers got more requests than faster ones.
I've tried to slow down the traffic generator and force it to produce less requests by the following patch:
This
delay()
function in wrk script sets the number of milliseconds to delay sending the next request. The value of500
reduced frame rate from 6-7krps to 3-4krps and overall system load reduced from 95-99% to ~50%. Such 'rate limitation' fixed all the instability issues for me. I suppose that setting constant request rate for all functional 'stress' tests will fix their instability problems.Why patch
delay()
function? Wrk is the most suitable tool for our tests, thanks to Lua scripting. Pipelining, header modifications (sticky cookies), response codes statistics and other things are heavily used in our tests. Not all load generators have the same level of flexibility. But Wrk can't preserve constant request rate. Wrk2 has a constant rate option, but it constantly crashes with 'division by 0' errors on our lua scripts. Thisdelay()
patch is just a dirty patch to understand what is happening and how to fix it better. A more sophisticated solution is required,delay()
value is correlated with number of concurrent connections.What should be done to make work load tests strict and stable:
ratio predict
scheduler use information about server performance in past to configure load balancer.