Closed hg closed 2 years ago
Fedora job will fail because F36 is using OpenSSL 3.0. Fixed in #386.
There's also this patch which removes unnecessary calls to kevent()
, improving iperf3
throughput by 4%.
Not sure how safe it is, though, since it depends on flags
always being in agreement with internal kqueue state, unlike how both select
and epoll
support are implemented.
Edit: not needed anymore, see below.
I have some older machines with dual Ethernet ports I could convince to run FreeBSD. Results from a VM should indeed be taken with a very large grain of salt.
kqueue setup is now reduced to a single syscall thanks to EV_RECEIPT
which I originally missed in the man page.
The difference is more significant on OpenBSD (same limitations — it's a similarly configured virtual machine).
select
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 21.6 MBytes 181 Mbits/sec 0 52.3 KBytes
[ 5] 1.00-2.00 sec 22.0 MBytes 185 Mbits/sec 0 86.3 KBytes
[ 5] 2.00-3.00 sec 22.1 MBytes 185 Mbits/sec 0 127 KBytes
[ 5] 3.00-4.00 sec 21.6 MBytes 182 Mbits/sec 8 93.3 KBytes
[ 5] 4.00-5.00 sec 21.7 MBytes 182 Mbits/sec 5 119 KBytes
[ 5] 5.00-6.00 sec 22.4 MBytes 188 Mbits/sec 9 123 KBytes
[ 5] 6.00-7.00 sec 22.1 MBytes 185 Mbits/sec 6 107 KBytes
[ 5] 7.00-8.00 sec 21.5 MBytes 180 Mbits/sec 5 127 KBytes
[ 5] 8.00-9.00 sec 22.1 MBytes 185 Mbits/sec 14 110 KBytes
[ 5] 9.00-10.00 sec 22.1 MBytes 185 Mbits/sec 8 94.7 KBytes
[ 5] 10.00-11.00 sec 22.1 MBytes 186 Mbits/sec 12 119 KBytes
[ 5] 11.00-12.00 sec 22.1 MBytes 186 Mbits/sec 10 100 KBytes
[ 5] 12.00-13.00 sec 22.1 MBytes 185 Mbits/sec 4 129 KBytes
[ 5] 13.00-14.00 sec 21.7 MBytes 182 Mbits/sec 16 110 KBytes
[ 5] 14.00-15.00 sec 21.7 MBytes 182 Mbits/sec 5 112 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-15.00 sec 329 MBytes 184 Mbits/sec 102 sender
[ 5] 0.00-15.00 sec 328 MBytes 184 Mbits/sec receiver
kqueue
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 33.7 MBytes 282 Mbits/sec 0 55.1 KBytes
[ 5] 1.00-2.00 sec 35.8 MBytes 300 Mbits/sec 0 86.3 KBytes
[ 5] 2.00-3.00 sec 37.6 MBytes 315 Mbits/sec 0 120 KBytes
[ 5] 3.00-4.00 sec 37.7 MBytes 316 Mbits/sec 0 154 KBytes
[ 5] 4.00-5.00 sec 36.2 MBytes 304 Mbits/sec 0 189 KBytes
[ 5] 5.00-6.00 sec 36.4 MBytes 305 Mbits/sec 0 225 KBytes
[ 5] 6.00-7.00 sec 35.8 MBytes 300 Mbits/sec 0 256 KBytes
[ 5] 7.00-8.00 sec 36.1 MBytes 303 Mbits/sec 0 305 KBytes
[ 5] 8.00-9.00 sec 36.2 MBytes 303 Mbits/sec 0 332 KBytes
[ 5] 9.00-10.00 sec 36.8 MBytes 309 Mbits/sec 0 365 KBytes
[ 5] 10.00-11.00 sec 36.5 MBytes 307 Mbits/sec 0 532 KBytes
[ 5] 11.00-12.00 sec 37.0 MBytes 311 Mbits/sec 0 532 KBytes
[ 5] 12.00-13.00 sec 36.0 MBytes 302 Mbits/sec 0 532 KBytes
[ 5] 13.00-14.00 sec 36.0 MBytes 302 Mbits/sec 0 532 KBytes
[ 5] 14.00-15.00 sec 37.0 MBytes 311 Mbits/sec 0 532 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-15.00 sec 545 MBytes 305 Mbits/sec 0 sender
[ 5] 0.00-15.02 sec 543 MBytes 303 Mbits/sec receiver
I split event.c
as it's pretty difficult to work with already. Two I/O tree updates were missing in Windows code because it's hard to keep track of all the #ifdef
s.
There's a small amount of copy-pasted code in functions like io_add
/io_set
.
Getting rid of it requires introducing more "public" functions and more calls between translation units. I didn't think it to be worth it, but if you'd rather not have duplicate logic, let me know.
Ok, tried it on FreeBSD on two identical nodes with a gigabit Ethernet switch between them. I'm not trusting iperf results anymore since I got this:
Connection | Throughput |
---|---|
Direct | 615 Mbit/s |
1.1 | 646 Mbit/s |
hg/kqueue | 690 Mbit/s |
I've rerun the tests, there's a standard deviation of a few Mbit/s. I don't know why the direct connection is slower. Anyway, no regression for your patch, even a ~7% boost in performance (although I would take that with a grain of salt).
Similar to #266, but for FreeBSD/OpenBSD/NetBSD/macOS.
event.c
is in need of splitting into multiple files. I'd rather do that in a separate PR.Performance
It would be great to test this on physical machines with a fast network between them, if only I had the hardware. So here are results for a FreeBSD VM (13.1) on a Linux desktop (5.17.5).
Profiles
Start tincd, run 30 seconds of
iperf3
, stop tincd.select
kqueue
Baseline
Direct connection, no tincd.
epoll
+select
epoll
+kqueue
wrk
results are very unstable and could easily be swapped the other way. The only real difference I'm seeing are somewhat lower latencies withkqueue
.