scottlamb / retina

High-level RTSP multimedia streaming library, in Rust
https://crates.io/crates/retina
Apache License 2.0
244 stars 48 forks source link

check if we make unnecessary syscalls with UDP #51

Open scottlamb opened 2 years ago

scottlamb commented 2 years ago

When using Transport::Udp, <Session as futures::Stream>::poll_next calls into poll_udp, which ultimately polls each UDP socket:

https://github.com/scottlamb/retina/blob/1145ed053e87c687651144f8f7b43f04e7aa6ddf/src/client/mod.rs#L1739 https://github.com/scottlamb/retina/blob/1145ed053e87c687651144f8f7b43f04e7aa6ddf/src/client/mod.rs#L1778

poll_next is (intended to be) called any time the Stream was awakened for any reason: any one of the sockets or the keepalive timeout.

I suspect these are actually turning into extra recv syscalls on each socket (even ones that epoll_wait hasn't said are ready since recv last returned EWOULDBLOCK) until we hit the one that actually is ready. (This thought was inspired by reading about performance problems with tokio's FuturesUnordered in this discussion. We're doing things in a round-robin order rather than using FuturesUnordered, but the effect is probably similar.)

There are typically about five sockets involved when using UDP (RTSP TCP socket, and (RTP, RTCP) x (video, audio)), and syscalls are expensive, so making up to 5X as many syscalls as necessary is bad.

We should check for this. Maybe just by examining strace output of a UDP session and/or inspecting the tokio code.

If we indeed are making unnecessary syscalls, we should restructure the code to avoid it. A couple possible structures:

That might be worth doing even if tokio/mio stops us from reaching the syscall stage; that code might be expensive enough to be worth avoiding too.

It'd be nice to have a proper benchmark of UDP, too, like we have for TCP.

scottlamb commented 2 years ago

Good news: I don't see the repeated EWOULDBLOCK-returning reads. Still worth adding a benchmark and maybe even seeing if one of the other approaches reduces the CPU usage within Retina/tokio/mio userspace code. But nothing is obviously deeply wrong.

I checked with:

$ strace -fo strace-out --timestamps -e epoll_ctl,socket,bind,connect,sendto,epoll_wait,recvfrom target/release/examples/camera ...

Trying to match reads up with epoll returns gets confusing because tokio/mio apparently use an integer that's not not the file descriptor to represent the socket. so you have to mentally do an extra mapping.

The simpler way is to check for any given recvfrom syscall that returned EWOULDBLOCK if the previous recvfrom on that same fd also returned EWOULDBLOCK, which would be suspicious. I just visually skimmed rather than writing a program to check exhaustively, but it looked fine. Our poll_recv method calls are not consistently turning into extraneous syscalls.

recvmmsg

I did notice that we sometimes have to call recvfrom repeatedly to get all the available messages:

830077 12:59:34 recvfrom(10,  <unfinished ...>
830078 12:59:34 epoll_wait(3,  <unfinished ...>
830077 12:59:34 <... recvfrom resumed>"\200`\360z0\335\240\34\32\204\v4|\5\271\257'\237\234H)x\351D@\221)\3\n:i\346"..., 65536, 0, NULL, NULL) = 1452
830077 12:59:34 recvfrom(10, "\200`\360{0\335\240\34\32\204\v4|\5\3240\3<\322\322'/[=]Sp\371\301\320/\30"..., 65536, 0, NULL, NULL) = 1452
830077 12:59:34 recvfrom(10, "\200`\360|0\335\240\34\32\204\v4|\5Q\362\333L\307[\376\336\212\371a@\"\250\326o\24\311"..., 65536, 0, NULL, NULL) = 1452
830077 12:59:34 recvfrom(10, "\200`\360}0\335\240\34\32\204\v4|\584\23\230\277wr\310z\17n>m\264\317\246H\337"..., 65536, 0, NULL, NULL) = 1452
830077 12:59:34 recvfrom(10, 0x7ffd56ca4ce0, 65536, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

On Linux, we could use the non-standard recvmmsg syscall to turn those 4 message-returning calls into 1.

(Probably not all 5 recvfrom calls into 1; recvmmsg's manpage says "If an error occurs after at least one message has been received, the call succeeds, and returns the number of messages received." So recvmmsg returning fewer messages than we asked for isn't enough for us to know the fd doesn't have something more to tell us. We have to actually see EAGAIN for that. And tokio/mio seem to be using edge-driven mode EPOLLET, so we really need to know we've exhausted everything the fd has to offer before we stop trying.)

This is enough extra code (and buffer space) that I probably won't do it unless we have a particular need to improve UDP-specific performance.