tokio-rs / tokio

A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
https://tokio.rs
MIT License
27.09k stars 2.49k forks source link

{join,leave}_multicast_v{4,6} functions should probably be async #5674

Open RReverser opened 1 year ago

RReverser commented 1 year ago

Currently multicast join/leave functions on the UdpSocket are defined as synchronous.

Those functions need to send MLD messages to the multicast group, which makes them perform network I/O, which is normally offloaded and exposed via async API in tokio, so it's somewhat surprising those few aren't.

Indeed, as any I/O, they can take considerable amount of time, especially in problematic cases like when the network takes a long time to respond. This is the order of timing I'm seeing on my system when joining multicast on all available network interfaces (something I need for the app I'm working on):

DEBUG    ┝━ join_multicast_group [ 125µs | 0.13% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 28.2µs | 0.03% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 61.4ms | 63.76% ]
ERROR    │  ┕━ 🚨 [error]:  | error: An unknown, invalid, or unsupported option or level was specified in a getsockopt or setsockopt call. (os error 10042)
DEBUG    ┝━ join_multicast_group [ 25.9µs | 0.03% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 12.9µs | 0.01% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 13.3µs | 0.01% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 21.3µs | 0.02% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┝━ join_multicast_group [ 16.5µs | 0.02% ]
DEBUG    │  ┕━ 🐛 [debug]:  | return: ()
DEBUG    ┕━ join_multicast_group [ 12.9µs | 0.01% ]
DEBUG       ┕━ 🐛 [debug]:  | return: ()

Those are release timings, in debug mode they're a bit worse.

While blocking the main thread for 12-125µs can be acceptable with some stretch, the 61.4ms in the exceptional problematic case is definitely not, and, because all network interfaces on different machines are different, there's no way to predict how long the call is going to take and seems safer to always offload those to a dedicated I/O thread.

I realise changing that would be a breaking change, but given that I haven't seen previous discussion on this in the repo, I thought I'd bring them up. I think it's worth tracking this issue to change their signature (and implementation) in the next major release, and meanwhile at least add a note to the docs suggesting that those functions can be long-blocking and probably should be wrapped in the spawn_blocking API.

Darksonn commented 1 year ago

If they perform IO, then I'm surprised they work at all, considering that the socket is set to non-blocking. I would expect them to fail with WouldBlock errors often.

RReverser commented 1 year ago

That's a good point. I'm no expert, but from a quick read, apparently the idea is that OS should just put MLD messages in a queue for you when socket is in a non-blocking mode, so those functions would return w/o waiting for messages to be actually sent. At least in my case, I'm on Windows, so I wonder if its implementation differs and it still waits for the messages regardless of socket status.

RReverser commented 1 year ago

apparently the idea is that OS should just put MLD messages in a queue for you when socket is in a non-blocking mode, so those functions would return w/o waiting for messages to be actually sent

Although I wonder if that also means that non-Windows OS don't actually report correct success/failure status from join_multicast_v6 if they don't wait for response to their MLD request. If so, that seems like another reason to make those async.

Darksonn commented 1 year ago

@Thomasdezeeuw Do you know any more about these methods?

RReverser commented 1 year ago

Those functions need to send MLD messages to the multicast group

Self-correction: only IPv6 sends MLD messages, IPv4 uses IGMP. Not that it matters much for the overall issues as both methods still need to send messages IIUC.

Thomasdezeeuw commented 1 year ago

I'm not aware any OS provides an async interface for this, I think all OSs use setsockopt(2) for this, which is synchronous. Maybe io_uring will provide an interface for setsockopt at some point, but it doesn't at this time.

RReverser commented 1 year ago

Thanks for confirmation. I googled around and also couldn't find any async way to retrieve their status.

Given that those APIs are currently blocking and might block for a long time, do you think this suggestion

and probably should be wrapped in the spawn_blocking API

makes sense?

At least for the initial implementation, Tokio could wrap them into spawn_blocking and switch to native async API if/when it's ever implemented.

Darksonn commented 1 year ago

We can't really wrap them in spawn_blocking without deprecating them and providing an async version under a new name.

RReverser commented 1 year ago

We can't really wrap them in spawn_blocking without deprecating them and providing an async version under a new name.

Sure, I understand that. As I mentioned in the issue description, I realise this would be a breaking change, although it seems like a worthwhile long-term.

For now either adding them as new methods as you say or even just

at least add a note to the docs suggesting that those functions can be long-blocking and probably should be wrapped in the spawn_blocking API

would be helpful I think for those who, like me, didn't realise that those functions might block for a while.