This is a joint effort with @Wonshtrum to reduce the number of syscalls in Sōzu version 0.15.
When benchmarking version 0.15 against version 0.13.6, we found a disproportionate number of getrandom and writev syscalls, especially for HTTPS traffic.
writev
Here is the strace of how a simple response is sent to the client in a TLS-encrypted form, in Sōzu 0.13:
It is chunked by a custom BufferQueue in three writes.
Here is the same traffic sent to the client in version 0.15.14. There are more headers than in version 0.13, and for each of them, Sōzu performs a writev call:
This is due to the way Kawa stores data, and this data is passed to Rustls as is. Fortunately, the Rustls API offers a write_vectored method on its Writer, that performs all writes in a single syscall. We opted for it.
As you can see, the fixes restore the performance of Sōzu (in this somewhat limited usecase) back to 0.13.6 levels. The 0.15.14 version still struggles when confronted with many concurrent TLS handshakes, when compared with the 0.13.6, but we're getting there.
Note that the throughput has augmented, that's because headers are more numerous since the introduction of Kawa.
getrandom, it's all about TLS 1.3 resumption tickets
We found that Sōzu 0.15.14 would create four times more getrandom syscalls during a TLS handshake than the 0.13.6 version:
After a lot of digging, we found that Rustls 0.19, used in Sōzu 0.13.6, seems to produce one TLS 1.3 ticket (used by a client to resume a TLS session). Producing a ticket needs 3 getrandom syscalls, as far as we understand.
Sōzu 0.15.14 does 4 times as many getrandom calls at this step of the TLS handshake. That is because Rustls 0.21.8, used in Sōzu 0.15, produces four TLS 1.3 tickets by default, since this commit in this PR, meant to resolve this issue, and I quote the issue since it seems relevant:
RFC8446 section 4.6.1recommends that TLS 1.3 servers send multiple session resumption tickets to clients. In appendix C.4, it's subsequently recommended that clients use tickets at most once to avoid session tracking. The current implementation of ClientSessionMemoryCache does not do this, and some properties of StoresClientSession (and use of the cache in general) make doing so difficult.
The number of tickets is accessible in this public field of the Rustls Configuration of Rustls. It does default to 4. Resetting it to 1 may improve the performance of Sōzu's TLS handshake for intense traffic with a lot of simultaneous TLS handshakes.
A default 1 ticket production seems appropriate for Sōzu for some users, but we may still want to make this number configurable in the Sōzu configuration file, so that any Sōzu user can trade security and performance to its liking. What do you think @FlorentinDUBOIS and @Wonshtrum ?
EDIT: benchmarking this change with tls-perf does NOT seem to improve performance in any relevant way. We may keep the line of code with a set value of 4, and an explanating comment, for future generations of developers.
This is a joint effort with @Wonshtrum to reduce the number of syscalls in Sōzu version 0.15. When benchmarking version 0.15 against version 0.13.6, we found a disproportionate number of
getrandom
andwritev
syscalls, especially for HTTPS traffic.writev
Here is the strace of how a simple response is sent to the client in a TLS-encrypted form, in Sōzu 0.13:
It is chunked by a custom
BufferQueue
in three writes.Here is the same traffic sent to the client in version 0.15.14. There are more headers than in version 0.13, and for each of them, Sōzu performs a writev call:
This is due to the way Kawa stores data, and this data is passed to Rustls as is. Fortunately, the Rustls API offers a write_vectored method on its Writer, that performs all writes in a single syscall. We opted for it.
This improves performance significantly.
Performance improvement
TLS v1.3
TLS13_CHACHA20_POLY1305_SHA256
Sōzu 0.13.6
Sōzu 0.15.14 without fixes
Sōzu 0.15.14 with both fixes of this pull request
As you can see, the fixes restore the performance of Sōzu (in this somewhat limited usecase) back to 0.13.6 levels. The 0.15.14 version still struggles when confronted with many concurrent TLS handshakes, when compared with the 0.13.6, but we're getting there. Note that the throughput has augmented, that's because headers are more numerous since the introduction of Kawa.
getrandom
, it's all about TLS 1.3 resumption ticketsWe found that Sōzu 0.15.14 would create four times more
getrandom
syscalls during a TLS handshake than the 0.13.6 version:After a lot of digging, we found that Rustls 0.19, used in Sōzu 0.13.6, seems to produce one TLS 1.3 ticket (used by a client to resume a TLS session). Producing a ticket needs 3
getrandom
syscalls, as far as we understand.Sōzu 0.15.14 does 4 times as many
getrandom
calls at this step of the TLS handshake. That is because Rustls 0.21.8, used in Sōzu 0.15, produces four TLS 1.3 tickets by default, since this commit in this PR, meant to resolve this issue, and I quote the issue since it seems relevant:The number of tickets is accessible in this public field of the Rustls Configuration of Rustls. It does default to 4. Resetting it to 1 may improve the performance of Sōzu's TLS handshake for intense traffic with a lot of simultaneous TLS handshakes.
A default 1 ticket production seems appropriate for Sōzu for some users, but we may still want to make this number configurable in the Sōzu configuration file, so that any Sōzu user can trade security and performance to its liking. What do you think @FlorentinDUBOIS and @Wonshtrum ?
EDIT: benchmarking this change with tls-perf does NOT seem to improve performance in any relevant way. We may keep the line of code with a set value of 4, and an explanating comment, for future generations of developers.
Comments welcome!