Make sure that during the verification tests the user space servers run on vanilla kernel (our patch still implies changes even with undefined Tempesta).

Test configuration

Benchmark results on CPU i9-12900HK (0-11 are performance cores) for the current master d7294b86a563bd18d5af51f701821357ade089de (the same as release-0.7 as of Jul 13) vs Nginx 1.18.0 build with OpenSSL 3.0.2. Tempesta FW and Nginx are running in a VM with 2 vCPUs bound to performance cores 0 and 2. All the benchmark tools were running from the host machine. The VM uses multiqueue virtio-net.

Tempesta config:

listen 192.168.100.4:443 proto=https;
listen 192.168.100.4:80;

frang_limits {
    http_methods GET HEAD;
    http_uri_len 1000;
}

block_action attack reply;
block_action error reply;

srv_group default {
    server 192.168.100.4:8000;
}

vhost tempesta-tech.com {
    tls_certificate /root/tempesta/etc/tfw-root.crt;
        tls_certificate_key /root/tempesta/etc/tfw-root.key;
    tls_tickets secret="f00)9eR59*_/22" lifetime=7200;

    resp_hdr_set Strict-Transport-Security "max-age=31536000; includeSubDomains";

    proxy_pass default;
}

cache 1;
cache_fulfill * *;

http_chain redirection_chain {
    uri == "/blog"      -> 301 = /blog/1;
    uri == "/blog/"     -> 301 = /blog/1;
    uri == "/services"  -> 301 = /development-services;
    uri == "/services.html" -> 301 = /development-services;
    uri == "/c++-services"  -> 301 = /development-services;
    uri == "/index.html"    -> 301 = /index;
    uri == "/company.html"  -> 301 = /company;
    uri == "/blog/fast-programming-languages-c-c++-rust-assembly" -> 301 = /blog/fast-programming-languages-c-cpp-rust-assembly;

    -> tempesta-tech.com;
}

http_chain {
    host == "tempesta-tech.com" -> redirection_chain;
}

Nginx config servicing /var/www/tempesta-tech.com with 26KB index file or /var/www/html with a 10 bytes index:

user www-data;
worker_processes auto;
pid /run/nginx.pid;

events {
    worker_connections 65535;
    multi_accept on;
    use epoll;
}
worker_rlimit_nofile 65535;

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 100000;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type text/html;

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ssl_certificate /root/tempesta/etc/tfw-root.crt;
    ssl_certificate_key /root/tempesta/etc/tfw-root.key;

    ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
    ssl_prefer_server_ciphers on;

    server {
        listen 192.168.100.4:8000;
        listen 192.168.100.4:8443 default_server ssl http2;

        #root /var/www/tempesta-tech.com;
        root /var/www/html;

        index index;

        server_name _;

    add_header X-Crash-1377 ' ';

        location = /blog { # FIXME: this URI still redirects to http://tempesta-tech.com:8000/blog/
            return 301 https://tempesta-tech.com/blog/1;
        }
        location = /blog/ {
            return 301 https://tempesta-tech.com/blog/1;
        }
        location ~ /services(|.html)$ {
            return 301 https://tempesta-tech.com/development-services;
        }
        location ~ /c\+\+-services$ {
            return 301 https://tempesta-tech.com/development-services;
        }
        location = /index.html {
            return 301 https://tempesta-tech.com/index;
        }
        location ~ /blog/fast-programming-languages-c-c\+\+-rust-assembly$ {
            return 301 https://tempesta-tech.com/blog/fast-programming-languages-c-cpp-rust-assembly;
        }

        error_page 403 404 /oops;
    }
}

TLS regression

We saw the problem earlier on migration from the Linux kerne 4.14 to 5.10 https://github.com/tempesta-tech/tempesta/issues/1504#issuecomment-946040195 . Also see the tail latency problem in https://github.com/tempesta-tech/tempesta/issues/1434

$ taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 192.168.100.4 443
( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 10; HANDSHAKES 209609
 HANDSHAKES/sec:  MAX 22239; AVG 20959; 95P 12057; MIN 12057
 LATENCY (ms):    MIN 5; AVG 14; 95P 7; MAX 856

$ taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 192.168.100.4 8443
( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 10; HANDSHAKES 148105
 HANDSHAKES/sec:  MAX 15718; AVG 14808; 95P 10052; MIN 10052
 LATENCY (ms):    MIN 16; AVG 64; 95P 111; MAX 1724

$ taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 192.168.100.4 443
( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 10; HANDSHAKES 61622
 HANDSHAKES/sec:  MAX 7071; AVG 6160; 95P 4372; MIN 4372
 LATENCY (ms):    MIN 258; AVG 406; 95P 534; MAX 564

$ taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 192.168.100.4 8443
( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 9; HANDSHAKES 61392
 HANDSHAKES/sec:  MAX 7184; AVG 6496; 95P 5937; MIN 5937
 LATENCY (ms):    MIN 63; AVG 488; 95P 584; MAX 763

The reference could the FOSDEM'21 demo (results were pretty stable and we ran it on different machines):

now abbreviated handshakes are less than 2 times faster than OpenSSL while previously we had more than 2 times.
now we have slower full handshakes while previously we had almost 2 times faster. The difference is quite small and can be just a testing deviation (I just made one run), but we still should not see even equal numbers.

OpenSSL 3 also integrated Bernstein elliptic curve multiplication, so it's expected that it's faster than OpenSSL 1, which we tested previously. But there is still no reason why Tempesta TLS can be slower.

10 bytes cached response, no keep-alive

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.38ms   37.70ms   1.57s    97.00%
    Req/Sec     2.99k   409.46     5.75k    71.40%
  354473 requests in 30.10s, 128.12MB read
Requests/sec:  11776.08
Transfer/sec:      4.26MB

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    44.59ms   17.90ms 121.42ms   70.45%
    Req/Sec     2.78k   288.37     4.36k    75.34%
  329486 requests in 30.07s, 83.58MB read
Requests/sec:  10957.63
Transfer/sec:      2.78MB

Nginx is only negligibly slower. I'd suppose that wrk uses abbreviated TLS handshakes, but I'm not sure and this should be verified.

10 bytes cached response, keep-alive

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    23.86ms   20.21ms  77.83ms   73.18%
    Req/Sec    11.83k     6.28k   25.26k    78.99%
  1404153 requests in 30.09s, 482.08MB read
Requests/sec:  46671.73
Transfer/sec:     16.02MB

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    12.62ms    4.77ms 112.47ms   77.13%
    Req/Sec    19.98k     1.42k   22.05k    92.35%
  2368764 requests in 30.05s, 612.20MB read
Requests/sec:  78831.90
Transfer/sec:     20.37MB

Nginx is almost 2 times faster than Tempesta.

26KB cached response, no keep-alive

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    25.35ms   39.93ms   1.58s    96.93%
    Req/Sec     2.04k   331.83     4.76k    71.52%
  242204 requests in 30.08s, 5.92GB read
Requests/sec:   8052.70
Transfer/sec:    201.53MB

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    51.40ms   19.75ms 273.29ms   70.91%
    Req/Sec     2.51k   223.67     3.17k    75.76%
  297959 requests in 30.08s, 7.25GB read
Requests/sec:   9906.66
Transfer/sec:    246.84MB

Nginx is still faster, but not so dramatically. IIRC on today demo we saw reversed results: Tempesta behaved better on the smaller file than on the large one.

26KB cached response, keep-alive

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    87.61ms  108.18ms   1.94s    86.76%
    Req/Sec     4.18k   564.97     8.59k    74.43%
  495844 requests in 30.09s, 12.12GB read
  Socket errors: connect 0, read 0, write 0, timeout 7
Requests/sec:  16477.60
Transfer/sec:    412.44MB

$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    24.38ms    2.69ms 123.94ms   84.14%
    Req/Sec    10.29k   755.72    11.72k    79.12%
  1219107 requests in 30.09s, 29.67GB read
Requests/sec:  40511.45
Transfer/sec:      0.99GB

Nginx shows about 2.5 better result.

Conclusions

Having that I observed somewhat different results today on the same software versions, the same hardware and the same command, there are some testing methodology issues. In this case a best practice must be documented in the wiki
I didn't profile any of the cases, but there could be networking issues on our side, e.g. extra copies.
Another possible networking issue could be in TCP, e.g. unacknowledged segments or something like this - need to trace connections

TODO

Also need to test large body workload (VOD), about 5-10MB to test**

One of the options to debug TCP latency issues is to use SO_TIMESTAMPING, e.g. generate a massive workload and trace a single testing TCP stream with the timestamps. References:

BPF_PROG_TYPE_SOCK_OPS also allows tracing of TCP issues with retransmissions and windows

HTTP/2 benchmark

Current master as of 690cb94ab4b5e4379f41c0c7013f07ce0806610e with config

listen 192.168.100.4:443 proto=h2;
listen 192.168.100.4:80;

frang_limits {
    client_header_timeout 20;
    client_body_timeout 10;
    http_header_chunk_cnt 10;
    http_methods GET HEAD;
    http_uri_len 1000;
    http_resp_code_block 400 403 404 100 10;
}

# Allow only following characters in URI (no '%'): /a-zA-Z0-9&?:-._=
http_uri_brange 0x2f 0x41-0x5a 0x61-0x7a 0x30-0x39 0x26 0x3f 0x3a 0x2d 0x2e 0x5f 0x3d;

block_action attack reply;
block_action error reply;

srv_group default {
    server 192.168.100.4:8000;
}

tls_match_any_server_name;
tls_certificate /root/tempesta/etc/tfw-root.crt;
tls_certificate_key /root/tempesta/etc/tfw-root.key;

vhost tempesta-tech.com {
    tls_tickets secret="f00)9eR59*_/22" lifetime=7200;

    resp_hdr_set Strict-Transport-Security "max-age=31536000; includeSubDomains";

    proxy_pass default;
}

cache 1;
cache_fulfill * *;

#access_log on;

http_chain redirection_chain {
    uri == "/blog"      -> 301 = /blog/1;
    uri == "/blog/"     -> 301 = /blog/1;
    uri == "/services"  -> 301 = /development-services;
    uri == "/services.html" -> 301 = /development-services;
    uri == "/c++-services"  -> 301 = /development-services;
    uri == "/index.html"    -> 301 = /index;
    uri == "/company.html"  -> 301 = /company;
    uri == "/blog/fast-programming-languages-c-c++-rust-assembly" -> 301 = /blog/fast-programming-languages-c-cpp-rust-assembly;

    -> tempesta-tech.com;
}

http_chain {
    host == "tempesta-tech.com" -> redirection_chain;
}

and the old website index page (~26KB).

$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host

client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES128-GCM-SHA256
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done

finished in 27.03s, 36736.66 req/s, 915.98MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.18GB (25966282770) total, 214.65MB (225080802) headers (space savings 24.36%), 23.92GB (25688290320) data
                     min         max         mean         sd        +/- sd
time for request:       33us    388.79ms     23.10ms     18.06ms    79.81%
time for connect:   137.49ms    274.73ms    217.24ms     29.47ms    69.22%
time to 1st byte:   201.56ms    307.52ms    239.35ms     24.53ms    55.85%
req/s           :       0.00       56.33       43.69        8.06    76.46%

The same workload against Nginx:

$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com:8443
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host

client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: X25519 253 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done

finished in 37.75s, 26307.76 req/s, 653.34MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.09GB (25863137721) total, 124.08MB (130105008) headers (space savings 35.15%), 23.92GB (25688290320) data
                     min         max         mean         sd        +/- sd
time for request:       56us    517.08ms     32.86ms     21.60ms    72.57%
time for connect:   217.08ms    394.19ms    334.81ms     29.56ms    71.19%
time to 1st byte:   306.27ms    549.34ms    361.49ms     33.89ms    64.50%
req/s           :       0.00       38.09       30.64        5.50    66.21%

RPS, latencies, TTFB are better for Tempesta FW, but not so significantly.

Nginx on the standard Ubuntu kernel 5.15.0-86-generic:

$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com:8443
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host

client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: X25519 253 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done

finished in 27.87s, 35631.10 req/s, 884.90MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.09GB (25863498283) total, 124.42MB (130465570) headers (space savings 34.97%), 23.92GB (25688290320) data
                     min         max         mean         sd        +/- sd
time for request:       41us    365.74ms     23.85ms     12.87ms    78.71%
time for connect:   202.26ms    435.84ms    341.74ms     41.72ms    76.20%
time to 1st byte:   309.13ms    476.61ms    365.89ms     38.78ms    56.54%
req/s           :       0.00       72.89       43.55       12.67    74.22%

, so apparently there is a kernel regression.

Need to create a new Wiki page about HTTP/2 performance to report the current performance and make the results reproducable

We should use the -m option for h2load when checking performance. By default max concurrent streams is 1 and in this case we will not see any changes in working with a multiple streams. In tempesta-test we use: -c 10 and -m 100

-m, --max-concurrent-streams=N Max concurrent streams to issue per session. When http/1.1 is used, this specifies the number of HTTP pipelining requests in-flight. Default: 1

I have the following results on our server:

for nginx:

taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 94.242.233.20 8443

( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 9; HANDSHAKES 29436
 HANDSHAKES/sec:  MAX 3899; AVG 3062; 95P 1015; MIN 1015
 LATENCY (ms):    MIN 162.772; AVG 908.555; 95P 1397.94; MAX 2521.93

taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 94.242.233.20 8443

( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 9; HANDSHAKES 274905
 HANDSHAKES/sec:  MAX 34469; AVG 30146; 95P 7001; MIN 7001
 LATENCY (ms):    MIN 0.399567; AVG 56.5873; 95P 65.331; MAX 76.1464

for Tempesta:

taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 94.242.233.20 443

( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 8; HANDSHAKES 19854
 HANDSHAKES/sec:  MAX 3269; AVG 2137; 95P 931; MIN 931
 LATENCY (ms):    MIN 25.5909; AVG 1188.36; 95P 1756.09; MAX 2783.18

taskset --cpu-list 0,2,4,6  ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 94.242.233.20 443

( All peers are active, start to gather statistics )
========================================
 TOTAL:           SECONDS 9; HANDSHAKES 276967
 HANDSHAKES/sec:  MAX 38332; AVG 30516; 95P 2638; MIN 2638
 LATENCY (ms):    MIN 0.12646; AVG 51.5548; 95P 57.4831; MAX 60.4444

10 bytes response

nginx: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443 Running 30s test @ https://tempesta-tech.com:8443 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 12.32ms 8.03ms 68.78ms 71.24% Req/Sec 6.65k 573.02 7.73k 90.37% 792874 requests in 30.10s, 658.60MB read Requests/sec: 26340.64 Transfer/sec: 21.88MB

Tempesta: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 8.88ms 2.66ms 33.44ms 96.10% Req/Sec 5.77k 1.38k 7.33k 72.29% 611480 requests in 30.08s, 572.66MB read Requests/sec: 20328.04 Transfer/sec: 19.04MB

On branch MekhanikEvgenii/fix-socket-cpu-migration taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 8.88ms 3.83ms 371.71ms 92.83% Req/Sec 7.55k 1.13k 9.60k 72.94% 893831 requests in 30.09s, 835.97MB read Requests/sec: 29702.13 Transfer/sec: 27.78MB

10 bytes keep-alive

nginx: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443 Running 30s test @ https://tempesta-tech.com:8443 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 4.55ms 5.30ms 173.44ms 82.61% Req/Sec 84.80k 12.83k 105.94k 71.15% 10089405 requests in 30.01s, 8.23GB read Requests/sec: 336167.84 Transfer/sec: 280.84MB

Tempesta: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.77ms 3.22ms 384.87ms 99.72% Req/Sec 48.36k 3.86k 54.57k 98.48% 5729633 requests in 30.08s, 5.13GB read Requests/sec: 190455.56 Transfer/sec: 174.67MB

On branch MekhanikEvgenii/fix-socket-cpu-migration taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.71ms 1.91ms 202.20ms 95.45% Req/Sec 48.27k 3.45k 55.82k 98.73% 5724774 requests in 30.09s, 5.13GB read Requests/sec: 190224.66 Transfer/sec: 174.70MB

10 Kb response

nginx: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443 Running 30s test @ https://tempesta-tech.com:8443 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 9.85ms 5.74ms 61.71ms 82.56% Req/Sec 7.27k 714.25 8.29k 86.87% 867649 requests in 30.07s, 335.12MB read Requests/sec: 28850.70 Transfer/sec: 11.14MB

taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 9.06ms 3.13ms 226.67ms 90.75% Req/Sec 7.43k 0.99k 9.49k 79.93% 880554 requests in 30.05s, 824.65MB read Requests/sec: 29302.12 Transfer/sec: 27.44MB

On branch MekhanikEvgenii/fix-socket-cpu-migration taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 8.86ms 3.11ms 283.83ms 86.17% Req/Sec 7.47k 1.08k 9.55k 76.65% 884835 requests in 30.11s, 829.50MB read Requests/sec: 29390.95 Transfer/sec: 27.55MB

Testing environment: Used 3 CPUs VM for Tempesta, 3 CPUs VM for client. 612 bytes content-length. 210 headers length on nginx. 257 headers length on Tempesta. Default minimal config with enabled caching.

Interesting thing, testing Tempesta I can't load processor more then 50% doesn't meter how many CPUs used generating traffic. For both cases(1 and 3 CPU per client) I got maximum 50% of average cpu load. However, nginx has average load around 100% when for generating traffic used 3 CPUs.

Tempesta:

taskset --cpu-list 5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 271824.35 req/s, 207.64MB/s
requests: 5436487 total, 5445987 started, 5436487 done, 5436487 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 5436487 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.06GB (4354630287) total, 886.57MB (929639277) headers (space savings 24.00%), 3.10GB (3327130044) data
                     min         max         mean         sd        +/- sd
time for request:      935us    267.49ms     31.21ms     15.22ms    68.37%
time for connect:     4.78ms     40.09ms     26.07ms     12.22ms    35.00%
time to 1st byte:    18.30ms     53.28ms     39.96ms     10.40ms    59.00%
req/s           :    2237.13     3182.46     2718.05      220.84    65.00%

In: 50 Mbit/s
Out: 1.83 GBit/s

taskset --cpu-list 3,4,5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 224867.65 req/s, 171.99MB/s
requests: 4497353 total, 4506853 started, 4497353 done, 4497353 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4497353 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.36GB (3606881306) total, 737.71MB (773544716) headers (space savings 23.89%), 2.56GB (2752380036) data
                     min         max         mean         sd        +/- sd
time for request:      250us    661.34ms     40.37ms     28.09ms    78.48%
time for connect:     7.61ms     36.24ms     25.25ms      8.26ms    71.00%
time to 1st byte:    18.72ms     51.68ms     35.63ms      9.60ms    61.00%
req/s           :    1527.13     4271.32     2248.38      520.91    63.00%

In: 105 Mbit/s
Out: 1.45 GBit/s

taskset --cpu-list 5 h2load -c10 -m95 -t2 -D20 https://ubuntu
finished in 20.00s, 288863.05 req/s, 220.94MB/s
requests: 5777261 total, 5778211 started, 5777261 done, 5777261 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 5777261 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.32GB (4633363742) total, 947.66MB (993688892) headers (space savings 23.89%), 3.29GB (3535683732) data
                     min         max         mean         sd        +/- sd
time for request:      213us     15.70ms      2.55ms      1.17ms    80.37%
time for connect:     2.80ms      6.68ms      4.30ms      1.59ms    70.00%
time to 1st byte:     4.29ms     11.70ms      7.56ms      2.94ms    60.00%
req/s           :   27561.91    30657.87    28883.85     1175.32    50.00%

In: 41 Mbit/s
Out: 1.95 GBit/s


taskset --cpu-list 3,4,5 h2load -c200 -m1 -t2 -D20 https://ubuntu
finished in 20.04s, 219628.00 req/s, 167.98MB/s
requests: 4392560 total, 4392760 started, 4392560 done, 4392560 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4392560 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.28GB (3522841520) total, 720.52MB (755520320) headers (space savings 23.89%), 2.50GB (2688246720) data
                     min         max         mean         sd        +/- sd
time for request:      100us      5.95ms       684us       199us    73.68%
time for connect:    20.75ms     37.25ms     33.79ms      3.30ms    78.50%
time to 1st byte:    34.85ms     40.94ms     37.69ms      1.91ms    55.50%
req/s           :    1081.90     1124.74     1098.06       15.69    70.50%

In: 183 Mbit/s
Out: 1.47 GBit/s

Nginx:

taskset --cpu-list 5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 231835.10 req/s, 165.38MB/s
requests: 4636702 total, 4646202 started, 4636702 done, 4636702 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4636702 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.23GB (3468257996) total, 521.78MB (547130836) headers (space savings 33.33%), 2.64GB (2837661624) data
                     min         max         mean         sd        +/- sd
time for request:      544us    123.87ms     39.47ms     16.11ms    60.45%
time for connect:     3.22ms     30.94ms     17.57ms      8.01ms    58.00%
time to 1st byte:    15.96ms     89.42ms     52.64ms     13.97ms    66.00%
req/s           :    1512.95     3509.12     2318.01      842.96    79.00%

In: 34 Mbit/s
Out: 1.38 GBit/s

taskset --cpu-list 3,4,5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.04s, 220740.35 req/s, 157.47MB/s
requests: 4414807 total, 4424307 started, 4414807 done, 4414807 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4414807 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.08GB (3302280536) total, 496.81MB (520947226) headers (space savings 33.33%), 2.52GB (2701861884) data
                     min         max         mean         sd        +/- sd
time for request:      180us    109.29ms     41.96ms     11.95ms    62.94%
time for connect:     3.60ms     32.29ms     13.93ms      7.22ms    65.00%
time to 1st byte:    14.84ms     84.05ms     44.77ms     19.05ms    65.00%
req/s           :    1794.75     3344.97     2207.21      589.51    78.00%

In: 43 Mbit/s
Out: 1.28 GBit/s

PING Flood Handling Benchmark

Install python3.12 and golang.

Refer to these source files: https://gist.github.com/kingluo/07e66502b420a96ceaa5dd430140f43b

test python3 h2 + openssl (optionally with KTLS enabled)

sudo python3.12 h2_server.py
./ping_flood -address 192.168.2.1:443 -threads 1 -connections 10 -debug 1

Use btm to observe the network throughput.

20240627190959

test tempesta

sudo -E TFW_CFG_PATH=$HOME/tempesta-ping.conf ./scripts/tempesta.sh --restart
./ping_flood -address 192.168.2.1:443 -threads 1 -connections 10 -debug 1

20240627193758

ping_flood will print an error if the ping ack sequence is not continuous.
You can see that python3 has more than 10 times higher throughput than tempesta.
The memory usage of python3 remains the same, but the memory usage of tempesta keeps increasing (OOM).

tempesta-tech / tempesta

Multiple performance regressions #1940