Open vankoven opened 5 years ago
Is it possible to get x300 (from 30RPS to 10KRPS) performance improvement just moving to multiprocess scheme? Probably, if we really need to combine deproxy flexibility with wrk performance, it's better to move to TfwBomber and develop it's server side or, even better, just make https://github.com/tempesta-tech/tempesta/issues/471 .
The second concern is about particular multiprocess model. Probably we won't get much more performance just from 1 client and 1 server processes in deproxy. wrk uses many threads to load a server, so we should do the same - spawn many client and, correspondingly server, processes. Since massive process management can be expensive, deproxy in this model must have a setting to specify how many processes to spawn in each particular case.
All in all, I'm not against the multiprocess deproxy, but it it's doubtful that we can reach good performance with small effort and if significant effort is required, then why not to develop Tempesta native features (TfwBomber + server mode) instead?
Is it possible to get x300 (from 30RPS to 10KRPS) performance improvement just moving to multiprocess scheme?
Of course, no.
if we really need to combine deproxy flexibility with wrk performance, it's better to move to TfwBomber and develop it's server side
I'm convinced that we had to have a tool that can generate severe traffic load and in the same time provide deep data consistence checks. It's crucial for some functions, that preemption is disabled, AVX registers are saved and restored, no memory access errors happens when multiple messages are split or combined, in place crypto operations, zero-copy modifications, etc. The only way to be 100% sure that Tempesta works great and doesn't garbage passed data - is to test it under some load. Wrk-based tests are great, but they claims only two things about message integrity: http messages are correctly framed (but not necessarily that all the headers are framed correctly!), and response code is 200. Too generic checks as for me. And we have no integrity checks on server side, nginx may receive broken messages and close the connection, such situations won't be treated as error.
I think i've lead you in misunderstanding by the issue name as Multy-threaded Deproxy
. I expect to see a tool very similar to wrk with it's lua scripting capabilities for both client and server sides, but with all the disadvantages solved. Just couldn't find a better name for the issue.
Probably we won't get much more performance just from 1 client and 1 server processes in deproxy
We wont. We have configuration option in the tests configuration file concurrent_connections
. We should add concurrent_clients
option there. For developers we will use really low values: one client, one connection. In this case test behaviour will be like in current functional test: single request-response sequence for a single test. On the CI behaviour will be different: a lot of client processes would be spanned, with multiple concurrent connections, each connection will try it's own request-response sequence, it will be a lot of clients making same requests.
With that approach in mind, all the tests will use the same tooling, and the each test will describe a list of requests and responses with a couple of lines of code for each message.
why not to develop Tempesta native features (TfwBomber + server mode) instead?
I'm not against it, even more, I tried to say that current tools has disadvantages and doesn't make me 100% convinced that Tempesta has no issues if all the tests are passed. And something more special is required. I still call it deproxy
, since it's job is to validate proxy server behaviour. Would it be better to patch wrk or finish TfwBomber or do something else - I don't know.
We definitely must not throw out current wrk client and Nginx backend in sake of diversity for interoperability testing, see #111.
Separate process, started by the test framework. Has following arguments:
Script. - filename with the request/response functions. The content of it is described later.
ID. - client id.
Server address. - Address of Tempesta, IP and port.
Interfaces. - interfaces the client sockets will be binded to (using SO_BINDTODEVICE
socket option).
Concurrent connections. - Number of concurrent connections for each defined interface.
Debug file. - Filename to write debug information.
Repeats. - Number of repeats of the request/response sequence before connection is closed.
RPS. - maximum rps.
Split. - Boolean value, meaning that requests must be splited in multiple skbs (see #115).
Client has two structures to contain required information: client_data
for global client data and conn_data
for per connection data.
First, client waits for POSIX barrier until all other mock servers are ready. Then for every connection it does the following algorithm:
Open a new connection and bind it to interface. Set conn_data.request_id
to 0
.
Call gen_request_func()
from the script to generate the request.
Copy generated request to send buffer. Push request to conn_data.request_queue
and increment conn_data.request_id
.
If not conn_data.pipeline
go to step 5
, otherwise go to step 2
.
Send current buffer to Tempesta. Start conn_data.req_timeout
timer.
Receive as many as possible. If conn_data.req_timeout
is expired go to err
Try to parse response from the buffer. If full response is received go to 8
, else go to 6
.
For complete response run response_cb()
from the script. If error happen go to err
. Remove request from conn_data.request_queue
If there are request unreplied, go to 6
. Else stop conn_data.req_timeout
timer.
If there are unsent requests go to 2
.
Close connection. If conn_data.repeats
go to 1
.
Exit.
err.
Build error message, close connection and exit.
See #116 for gen_request_func()
and response_cb()
.
Follows the same concept as mock client. Arguments:
Script. - filename with the request/response functions. The content of it is described later.
ID. - server id.
Interface:conns. - interfaces the client sockets will be listening and number of Tempesta's connections for it.
Split. - Boolean value, meaning that requests must be splited in multiple skbs (#115).
Debug file. - Filename to write debug information.
Server is stopped by TERM
signal from the framework.
Script contains queue of expected requests server_data.requests[]
. If all requests was received, but a new request is received, then client is working in repeated mode, thus conn_data.expected_request_num
must be reset to 0
.
The Deproxy is fully written in Python and has a poor performance. As far as i remember, 20-30RPS was its maximum (while wrk+nginx chain can produce 10-12 KRPS). It's not a load generator, but it validates received messages very strictly. But the problem is that we have to develop two types of tests: deproxy-tests with one single message chain and workload tests using
wrk
. These two kind of tests are completely different.Issues we have faced with Deproxy during pat year:
Issues in wrk tests:
As for me, we should rework the deproxy to make it possible to serve at least 5-10KRPS to test all the possible multy-threaded effects in Tempesta. The tool MUST do full Http message validation as current deproxy does. It's not very hard to do quite fast: wrk already has full (but simple) http parser inside, so does nginx, while the validation doesn't require advanced http parsers, pattern matching is works in the deproxy for a long time. Each deproxy client and server must be a separate process, that gets a script as input (like wrk), writes all the sent and received messages in it's log file, make some assertions and returns eror code (0=Test passed, not-NULL=failed). The test framework must analyse only that returns codes.
That will allow to build both functional and workload code with one piece of code, only difference - is the concurrent connections parameter.
TfwBomber was the tool intended to replace the wrk and give more abilities for fuzzing, but it doesn't care about server side. It reuses a lot of Tempesta code. It can be useful.