Open scottlamb opened 2 years ago
Good news: I don't see the repeated EWOULDBLOCK
-returning reads. Still worth adding a benchmark and maybe even seeing if one of the other approaches reduces the CPU usage within Retina/tokio/mio userspace code. But nothing is obviously deeply wrong.
I checked with:
$ strace -fo strace-out --timestamps -e epoll_ctl,socket,bind,connect,sendto,epoll_wait,recvfrom target/release/examples/camera ...
Trying to match reads up with epoll returns gets confusing because tokio/mio apparently use an integer that's not not the file descriptor to represent the socket. so you have to mentally do an extra mapping.
The simpler way is to check for any given recvfrom
syscall that returned EWOULDBLOCK
if the previous recvfrom
on that same fd also returned EWOULDBLOCK
, which would be suspicious. I just visually skimmed rather than writing a program to check exhaustively, but it looked fine. Our poll_recv
method calls are not consistently turning into extraneous syscalls.
recvmmsg
I did notice that we sometimes have to call recvfrom
repeatedly to get all the available messages:
830077 12:59:34 recvfrom(10, <unfinished ...>
830078 12:59:34 epoll_wait(3, <unfinished ...>
830077 12:59:34 <... recvfrom resumed>"\200`\360z0\335\240\34\32\204\v4|\5\271\257'\237\234H)x\351D@\221)\3\n:i\346"..., 65536, 0, NULL, NULL) = 1452
830077 12:59:34 recvfrom(10, "\200`\360{0\335\240\34\32\204\v4|\5\3240\3<\322\322'/[=]Sp\371\301\320/\30"..., 65536, 0, NULL, NULL) = 1452
830077 12:59:34 recvfrom(10, "\200`\360|0\335\240\34\32\204\v4|\5Q\362\333L\307[\376\336\212\371a@\"\250\326o\24\311"..., 65536, 0, NULL, NULL) = 1452
830077 12:59:34 recvfrom(10, "\200`\360}0\335\240\34\32\204\v4|\584\23\230\277wr\310z\17n>m\264\317\246H\337"..., 65536, 0, NULL, NULL) = 1452
830077 12:59:34 recvfrom(10, 0x7ffd56ca4ce0, 65536, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
On Linux, we could use the non-standard recvmmsg
syscall to turn those 4 message-returning calls into 1.
(Probably not all 5 recvfrom
calls into 1; recvmmsg
's manpage says "If an error occurs after at least one message has been received, the call succeeds, and returns the number of messages received." So recvmmsg
returning fewer messages than we asked for isn't enough for us to know the fd doesn't have something more to tell us. We have to actually see EAGAIN
for that. And tokio/mio seem to be using edge-driven mode EPOLLET
, so we really need to know we've exhausted everything the fd has to offer before we stop trying.)
This is enough extra code (and buffer space) that I probably won't do it unless we have a particular need to improve UDP-specific performance.
When using
Transport::Udp
,<Session as futures::Stream>::poll_next
calls intopoll_udp
, which ultimately polls each UDP socket:https://github.com/scottlamb/retina/blob/1145ed053e87c687651144f8f7b43f04e7aa6ddf/src/client/mod.rs#L1739 https://github.com/scottlamb/retina/blob/1145ed053e87c687651144f8f7b43f04e7aa6ddf/src/client/mod.rs#L1778
poll_next
is (intended to be) called any time theStream
was awakened for any reason: any one of the sockets or the keepalive timeout.I suspect these are actually turning into extra
recv
syscalls on each socket (even ones thatepoll_wait
hasn't said are ready sincerecv
last returnedEWOULDBLOCK
) until we hit the one that actually is ready. (This thought was inspired by reading about performance problems with tokio'sFuturesUnordered
in this discussion. We're doing things in a round-robin order rather than usingFuturesUnordered
, but the effect is probably similar.)There are typically about five sockets involved when using UDP (RTSP TCP socket, and (RTP, RTCP) x (video, audio)), and syscalls are expensive, so making up to 5X as many syscalls as necessary is bad.
We should check for this. Maybe just by examining
strace
output of a UDP session and/or inspecting the tokio code.If we indeed are making unnecessary syscalls, we should restructure the code to avoid it. A couple possible structures:
Session
'spoll_next
just reads the other side of the channel rather than doing any work itself. Remember to clean up the tasks when theSession
is dropped.TaskSet
.That might be worth doing even if tokio/mio stops us from reaching the syscall stage; that code might be expensive enough to be worth avoiding too.
It'd be nice to have a proper benchmark of UDP, too, like we have for TCP.