Multy-threaded Deproxy - Githubissues

The Deproxy is fully written in Python and has a poor performance. As far as i remember, 20-30RPS was its maximum (while wrk+nginx chain can produce 10-12 KRPS). It's not a load generator, but it validates received messages very strictly. But the problem is that we have to develop two types of tests: deproxy-tests with one single message chain and workload tests using wrk. These two kind of tests are completely different.

Issues we have faced with Deproxy during pat year:

Too slow sometimes that we have to increase some timeouts (response wasn't received in time)
All requests and responses are processed asynchronously in one thread, while some tests require background operations that blocks (especially paramiko library). We've manually added new threads in that tests.

Issues in wrk tests:

wrk can possibly overload VM, especially if everything is started on the same VM since it doesn't support constant throughput mode.
need to thoroughly set asserts on wrk counters, it's very painful to find balance between false-negatives and allowed errors rate. Some errors will happen if backend configuration isn't optimal. And it isn't since we try to use backend defaults as long as it possible. This was the most painful work for me.
we had to add response code counters and check them after tests to identify suspicious situations.
we had to add socket error counters (connect, read, write) since it happens some time.
context of lua scripts in wrk is thread-global, used by multiple connections, but per-connection context is more desired to write simple and reliable tests.
Backend used in the tests is nginx, it's a real server, but something like server-side wrk is highly desired.
requests|responses|payload may be spoilt but nothing cares about it. Only Http message integrity is checked.

As for me, we should rework the deproxy to make it possible to serve at least 5-10KRPS to test all the possible multy-threaded effects in Tempesta. The tool MUST do full Http message validation as current deproxy does. It's not very hard to do quite fast: wrk already has full (but simple) http parser inside, so does nginx, while the validation doesn't require advanced http parsers, pattern matching is works in the deproxy for a long time. Each deproxy client and server must be a separate process, that gets a script as input (like wrk), writes all the sent and received messages in it's log file, make some assertions and returns eror code (0=Test passed, not-NULL=failed). The test framework must analyse only that returns codes.

That will allow to build both functional and workload code with one piece of code, only difference - is the concurrent connections parameter.

TfwBomber was the tool intended to replace the wrk and give more abilities for fuzzing, but it doesn't care about server side. It reuses a lot of Tempesta code. It can be useful.

Is it possible to get x300 (from 30RPS to 10KRPS) performance improvement just moving to multiprocess scheme? Probably, if we really need to combine deproxy flexibility with wrk performance, it's better to move to TfwBomber and develop it's server side or, even better, just make https://github.com/tempesta-tech/tempesta/issues/471 .

The second concern is about particular multiprocess model. Probably we won't get much more performance just from 1 client and 1 server processes in deproxy. wrk uses many threads to load a server, so we should do the same - spawn many client and, correspondingly server, processes. Since massive process management can be expensive, deproxy in this model must have a setting to specify how many processes to spawn in each particular case.

All in all, I'm not against the multiprocess deproxy, but it it's doubtful that we can reach good performance with small effort and if significant effort is required, then why not to develop Tempesta native features (TfwBomber + server mode) instead?

Is it possible to get x300 (from 30RPS to 10KRPS) performance improvement just moving to multiprocess scheme?

Of course, no.

if we really need to combine deproxy flexibility with wrk performance, it's better to move to TfwBomber and develop it's server side

I'm convinced that we had to have a tool that can generate severe traffic load and in the same time provide deep data consistence checks. It's crucial for some functions, that preemption is disabled, AVX registers are saved and restored, no memory access errors happens when multiple messages are split or combined, in place crypto operations, zero-copy modifications, etc. The only way to be 100% sure that Tempesta works great and doesn't garbage passed data - is to test it under some load. Wrk-based tests are great, but they claims only two things about message integrity: http messages are correctly framed (but not necessarily that all the headers are framed correctly!), and response code is 200. Too generic checks as for me. And we have no integrity checks on server side, nginx may receive broken messages and close the connection, such situations won't be treated as error.

I think i've lead you in misunderstanding by the issue name as Multy-threaded Deproxy. I expect to see a tool very similar to wrk with it's lua scripting capabilities for both client and server sides, but with all the disadvantages solved. Just couldn't find a better name for the issue.

Probably we won't get much more performance just from 1 client and 1 server processes in deproxy

We wont. We have configuration option in the tests configuration file concurrent_connections. We should add concurrent_clients option there. For developers we will use really low values: one client, one connection. In this case test behaviour will be like in current functional test: single request-response sequence for a single test. On the CI behaviour will be different: a lot of client processes would be spanned, with multiple concurrent connections, each connection will try it's own request-response sequence, it will be a lot of clients making same requests.

With that approach in mind, all the tests will use the same tooling, and the each test will describe a list of requests and responses with a couple of lines of code for each message.

why not to develop Tempesta native features (TfwBomber + server mode) instead?

I'm not against it, even more, I tried to say that current tools has disadvantages and doesn't make me 100% convinced that Tempesta has no issues if all the tests are passed. And something more special is required. I still call it deproxy, since it's job is to validate proxy server behaviour. Would it be better to patch wrk or finish TfwBomber or do something else - I don't know.

We definitely must not throw out current wrk client and Nginx backend in sake of diversity for interoperability testing, see #111.

From https://github.com/tempesta-tech/tempesta-test/pull/96/files#diff-ccd1b8962bc69bb1058314347d328c25R163 :

Mock Client

Separate process, started by the test framework. Has following arguments:

Script. - filename with the request/response functions. The content of it is described later.
ID. - client id.
Server address. - Address of Tempesta, IP and port.
Interfaces. - interfaces the client sockets will be binded to (using SO_BINDTODEVICE socket option).
Concurrent connections. - Number of concurrent connections for each defined interface.
Debug file. - Filename to write debug information.
Repeats. - Number of repeats of the request/response sequence before connection is closed.
RPS. - maximum rps.
Split. - Boolean value, meaning that requests must be splited in multiple skbs (see #115).

Client has two structures to contain required information: client_data for global client data and conn_data for per connection data.

First, client waits for POSIX barrier until all other mock servers are ready. Then for every connection it does the following algorithm:

Open a new connection and bind it to interface. Set conn_data.request_id to 0.
Call gen_request_func() from the script to generate the request.
Copy generated request to send buffer. Push request to conn_data.request_queue and increment conn_data.request_id.
If not conn_data.pipeline go to step 5, otherwise go to step 2.
Send current buffer to Tempesta. Start conn_data.req_timeout timer.
Receive as many as possible. If conn_data.req_timeout is expired go to err
Try to parse response from the buffer. If full response is received go to 8, else go to 6.
For complete response run response_cb() from the script. If error happen go to err. Remove request from conn_data.request_queue If there are request unreplied, go to 6. Else stop conn_data.req_timeout timer.
If there are unsent requests go to 2.
Close connection. If conn_data.repeats go to 1.
Exit.

err. Build error message, close connection and exit.

See #116 for gen_request_func() and response_cb().

Mock Server

Follows the same concept as mock client. Arguments:

Script. - filename with the request/response functions. The content of it is described later.
ID. - server id.
Interface:conns. - interfaces the client sockets will be listening and number of Tempesta's connections for it.
Split. - Boolean value, meaning that requests must be splited in multiple skbs (#115).
Debug file. - Filename to write debug information.

Server is stopped by TERM signal from the framework.

Script contains queue of expected requests server_data.requests[]. If all requests was received, but a new request is received, then client is working in repeated mode, thus conn_data.expected_request_num must be reset to 0.

tempesta-tech / tempesta-test

Multy-threaded Deproxy #107

Mock Client

Mock Server