istio / ztunnel

The `ztunnel` component of ambient mesh
Apache License 2.0
295 stars 99 forks source link

performance: upstream and downstream will never run concurrently #1327

Open howardjohn opened 1 month ago

howardjohn commented 1 month ago

copy_bidirectional uses tokio::join for the copy from upstream->downstream and vis-versa. Join is not concurrent, so its impossible for these two to happen at the same time (utilizing threads).

Intuitively it seems like this should be helpful. On a simple call-response workload, no, but if there is continuous flow of data in both directions it should.

I put together a prototype, however, and do not see any benefits:

``` HBONE Master DEST CLIENT QPS CONS DUR PAYLOAD SUCCESS THROUGHPUT P50 P90 P99 fortio-server fortio 0 1 5 0 96743 19348.29qps 0.048ms 0.066ms 0.113ms fortio-server fortio 0 1 5 1024 83755 16750.66qps 0.055ms 0.076ms 0.138ms fortio-server fortio 2000 1 5 0 9998 1999.53qps 0.094ms 0.143ms 0.277ms fortio-server fortio 2000 1 5 1024 10000 1999.89qps 0.106ms 0.156ms 0.289ms fortio-server fortio 0 2 5 0 146377 29274.81qps 0.061ms 0.093ms 0.180ms fortio-server fortio 0 2 5 1024 131639 26327.32qps 0.071ms 0.096ms 0.176ms fortio-server fortio 2000 2 5 0 10000 1999.50qps 0.118ms 0.175ms 0.331ms fortio-server fortio 2000 2 5 1024 10000 1999.59qps 0.134ms 0.198ms 0.358ms fortio-server fortio 0 4 5 0 222310 44460.78qps 0.085ms 0.117ms 0.192ms fortio-server fortio 0 4 5 1024 179137 35826.45qps 0.105ms 0.148ms 0.264ms fortio-server fortio 2000 4 5 0 9996 1998.58qps 0.125ms 0.183ms 0.304ms fortio-server fortio 2000 4 5 1024 9998 1998.93qps 0.150ms 0.236ms 0.606ms fortio-server fortio 0 64 5 0 407907 81573.11qps 0.744ms 1.430ms 1.965ms fortio-server fortio 0 64 5 1024 265418 53074.42qps 1.335ms 1.896ms 2.745ms fortio-server fortio 2000 64 5 0 9984 1987.19qps 0.142ms 0.200ms 0.296ms fortio-server fortio 2000 64 5 1024 9984 1987.05qps 0.177ms 0.258ms 0.385ms ID Interval Transfer Bitrate [ 0] 0.00..10.00 sec 10.46 GiB 8.99 Gbits/sec sender [ 0] 0.00..10.00 sec 10.41 GiB 8.94 Gbits/sec receiver Spawning DEST CLIENT QPS CONS DUR PAYLOAD SUCCESS THROUGHPUT P50 P90 P99 fortio-server fortio 0 1 5 0 95146 19028.90qps 0.048ms 0.067ms 0.112ms fortio-server fortio 0 1 5 1024 81596 16318.86qps 0.057ms 0.078ms 0.137ms fortio-server fortio 2000 1 5 0 9998 1999.40qps 0.094ms 0.139ms 0.250ms fortio-server fortio 2000 1 5 1024 9998 1999.54qps 0.107ms 0.154ms 0.271ms fortio-server fortio 0 2 5 0 156544 31306.96qps 0.060ms 0.085ms 0.138ms fortio-server fortio 0 2 5 1024 128630 25725.32qps 0.073ms 0.099ms 0.186ms fortio-server fortio 2000 2 5 0 9998 1999.36qps 0.121ms 0.179ms 0.316ms fortio-server fortio 2000 2 5 1024 10000 1999.42qps 0.135ms 0.200ms 0.371ms fortio-server fortio 0 4 5 0 218797 43758.22qps 0.086ms 0.119ms 0.196ms fortio-server fortio 0 4 5 1024 182502 36499.22qps 0.102ms 0.145ms 0.268ms fortio-server fortio 2000 4 5 0 9998 1998.75qps 0.137ms 0.235ms 0.479ms fortio-server fortio 2000 4 5 1024 9998 1998.85qps 0.164ms 0.287ms 0.631ms fortio-server fortio 0 64 5 0 400404 80069.12qps 0.755ms 1.469ms 1.973ms fortio-server fortio 0 64 5 1024 286835 57358.42qps 1.231ms 1.860ms 2.070ms fortio-server fortio 2000 64 5 0 9984 1987.07qps 0.144ms 0.212ms 0.364ms fortio-server fortio 2000 64 5 1024 9984 1986.99qps 0.180ms 0.258ms 0.411ms [ 0] 0.00..10.00 sec 10.77 GiB 9.25 Gbits/sec sender [ 0] 0.00..10.00 sec 10.71 GiB 9.20 Gbits/sec receiver TCP Master DEST CLIENT QPS CONS DUR PAYLOAD SUCCESS THROUGHPUT P50 P90 P99 fortio-server fortio 0 1 3 0 93706 31234.58qps 0.029ms 0.040ms 0.075ms fortio-server fortio 0 1 3 64000 18341 6113.28qps 0.138ms 0.267ms 0.395ms fortio-server fortio 2000 1 3 0 5998 1998.92qps 0.062ms 0.103ms 0.245ms fortio-server fortio 2000 1 3 64000 5998 1999.05qps 0.181ms 0.364ms 0.595ms fortio-server fortio 0 64 3 0 368849 122926.89qps 0.507ms 0.682ms 1.397ms fortio-server fortio 0 64 3 64000 43652 14530.98qps 3.361ms 9.953ms 19.560ms fortio-server fortio 2000 64 3 0 5952 1978.43qps 0.098ms 0.144ms 0.316ms fortio-server fortio 2000 64 3 64000 5952 1978.35qps 0.284ms 0.645ms 1.627ms [SUM] 0.00..10.00 sec 25.64 GiB 22.02 Gbits/sec sender [SUM] 0.00..9.97 sec 27.77 GiB 23.93 Gbits/sec receiver Spawning DEST CLIENT QPS CONS DUR PAYLOAD SUCCESS THROUGHPUT P50 P90 P99 fortio-server fortio 0 1 3 0 86532 28843.36qps 0.031ms 0.044ms 0.094ms fortio-server fortio 0 1 3 64000 14891 4963.41qps 0.164ms 0.330ms 0.640ms fortio-server fortio 2000 1 3 0 5998 1998.99qps 0.064ms 0.106ms 0.196ms fortio-server fortio 2000 1 3 64000 6000 1999.32qps 0.185ms 0.376ms 0.678ms fortio-server fortio 0 64 3 0 360397 120090.70qps 0.508ms 0.698ms 1.640ms fortio-server fortio 0 64 3 64000 44472 14802.99qps 3.186ms 9.915ms 19.528ms fortio-server fortio 2000 64 3 0 5952 1978.83qps 0.093ms 0.130ms 0.192ms fortio-server fortio 2000 64 3 64000 5952 1978.39qps 0.273ms 0.568ms 0.919ms [SUM] 0.00..10.00 sec 25.04 GiB 21.51 Gbits/sec sender [SUM] 0.00..9.96 sec 28.12 GiB 24.25 Gbits/sec receiver ```

We should investigate more

ilrudie commented 1 month ago

Did you push the prototype code?

bleggett commented 3 weeks ago

It shouldn't be terribly hard to use spawn and join on the handles instead, or whatever, I guess, if we want real parallelism.

A little more expensive for trivial cases, but probably worth it for all the others.

ilrudie commented 3 weeks ago

I think we can collect multiple handles from tokio::spawn and then await them. I presume something like that is what @howardjohn already tried but didn't see any benefit from.

bleggett commented 3 weeks ago

I think we can collect multiple handles from tokio::spawn and then await them. I presume something like that is what @howardjohn already tried but didn't see any benefit from.

Yeah - I misread. It probably doesn't make much difference because we already spawn the per-workload handler in a thread and are not remotely CPU bound even under load - distributing this specific operation across threads won't help much (and might make it easier for a greedy workload to starve other workloads on the node).

in general sticking with a one-thread-per-conn-handler-instance model seems best.