tokio-rs / tokio-uring

An io_uring backed runtime for Rust
MIT License
1.11k stars 117 forks source link

Implement opcode sendmsg (non-zc) for UdpStream. #263

Closed Icelk closed 1 year ago

Icelk commented 1 year ago

Fixes #261

Should I maybe try to reduce the code duplication in src/io/sendmsg_zc.rs and src/io/sendmsg.rs?

FrankReh commented 1 year ago

@Icelk You've been very responsive to my comments today. Once the existing tests pass, and I merge, will you be able to confirm this does what you want? As we don't have a unit test for any of the UDP sendmsg* methods, and I don't use them myself, it would help to know it works at least for one use case, before you and I loose the context of this PR.

Icelk commented 1 year ago

@Icelk You've been very responsive to my comments today.

You too, thanks!

Once the existing tests pass, and I merge, will you be able to confirm this does what you want?

I will certainly test if they work, but I won't be using the headers in production for a while.

Icelk commented 1 year ago

I did some testing.

I got sendmsg_zc and sendmsg to work, after modifying the code: they're broken right now. The msghdr needs to be boxed for both operations. Should I send a PR?

I also had some trouble setting msg_control's len to >= 16. I got:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }', examples/udp_socket.rs:26:75

which seems odd, as quinn uses an array with len 88.

FrankReh commented 1 year ago

Okay. Glad you tried it out. Just submit a fix for sendmsg. #260 exists for SendMsgZc.

Icelk commented 1 year ago

264

FrankReh commented 1 year ago

I also had some trouble setting msg_control's len to >= 16

~I don't have a good idea yet. Are you comfortable that the buffer you passed as the optional msg_control has a safe stable_ptr definition and that the bytes_init() value of the buffer reflects the length correctly? Or maybe something else about the test was deemed invalid?~

~You've seen this in the sendmsg man page? On my Ubuntu machines this is 20480. Nothing close to 15.~

You may send control information (ancillary data) using the msg_control and msg_controllen members. The maximum control buffer length the kernel can process is limited per socket by the value in /proc/sys/net/core/optmem_max; see socket(7). For further information on the use of ancillary data in various socket domains, see unix(7) and ip(7).

~Or now that the msghdr is boxed, maybe take the as_ptr() value of sendmsg.msghdr as the argument to opcode::SendMsg?~

as_ref(). Do you have the same problem if you pass sendmsg.msghdr.as_ref() to opcode::SendMsg?

FrankReh commented 1 year ago

@Icelk Have you had a chance to see if using as_ref fixes the Invalid Input error when length is greater than 15? I know the two don't sound related but sometimes it is hard to predict how a change in a test affects the stack usage.

Icelk commented 1 year ago

You've seen this in the sendmsg man page? On my Ubuntu machines this is 20480. Nothing close to 15.

Same for me.

I used the example udp_socket and changed the functions to sendmsg and recvmsg.

@Icelk Have you had a chance to see if using as_ref fixes the Invalid Input error when length is greater than 15?

as_ref made no difference :( Sendmsg uses stable_ptr and bytes_init so it shouldn't be any issues there.

I know the two don't sound related but sometimes it is hard to predict how a change in a test affects the stack usage.

That's the thing: the msg_control buffer is allocated, so this seems very odd.


Here are my changes to the example. Notice that it works when you change the vec![0u8; 16] to vec![0u8; 15].

diff --git a/examples/udp_socket.rs b/examples/udp_socket.rs
index 01eb9b3..921cea6 100644
--- a/examples/udp_socket.rs
+++ b/examples/udp_socket.rs
@@ -11,16 +11,12 @@ fn main() {
     let socket_addr: SocketAddr = args[1].parse().unwrap();
·
     tokio_uring::start(async {
-        let socket = UdpSocket::bind(socket_addr).await.unwrap();
+        let socket = UdpSocket::bind("[::0]:0".parse().unwrap()).await.unwrap();
·
         let buf = vec![0u8; 128];
·
-        let (result, mut buf) = socket.recv_from(buf).await;
-        let (read, socket_addr) = result.unwrap();
-        buf.resize(read, 0);
-        println!("received from {}: {:?}", socket_addr, &buf[..]);
-
-        let (result, _buf) = socket.send_to(buf, socket_addr).await;
-        println!("sent to {}: {}", socket_addr, result.unwrap());
+        println!("size {}", unsafe {libc::CMSG_SPACE(core::mem::size_of::<libc::in_pktinfo>() as _)});
+        let (result, _buf, _hdr) = socket.sendmsg::<Vec<u8>, Vec<u8>>(vec![buf], Some(socket_addr), Some(vec![0u8;16])).await;
+        println!("sent to {}: {}, headers: {_hdr:?}", socket_addr, result.unwrap());
     });
 }
FrankReh commented 1 year ago

@Icelk I'm reading about control data, often called ancillary data, and it could be its use is very limited and its meaning is very much dependent on the connection type. It seems it is not for sending extra data across the wire but for a way to send data to the kernel in the sendmsg case, and to receive from the kernel in the recvmsg case. Only for a Unix Domain socket, does it allow for sending a file descriptor from one process to another.

I'm still looking for the definition of sending ancillary data for a UDP connection to the kernel, how does the kernel interrupt that?

So maybe your sendmsg commit could have included an update to src/net/unix/stream.rs. Then a file descriptor could be sent from one process to another with this mechanism.

Icelk commented 1 year ago

It seems it is not for sending extra data across the wire but for a way to send data to the kernel

That's also what I've read. I believe it's used by QUIC implementations to improve network congestion. As I mentioned here, quinn seems to pass a 88-byte buffer to the syscall. It apparently works, which is why I got confused.

So maybe your sendmsg commit could have included an update to src/net/unix/stream.rs. Then a file descriptor could be sent from one process to another with this mechanism.

I think this would be a very niche use-case, and quite simple to implement on top of our implementation.

FrankReh commented 1 year ago

quinn seems to pass a 88-byte buffer to the syscall

Does quinn also send an 88-byte zero filled buffer or does it fill in the cmsghdr and does it ensure the buffer is aligned properly? I just read alignment can be required and there are cmsg_len, cmsg_level and cmsg_type fields to fill in. I'm more confused why it works for shorter lengths at the moment.

Icelk commented 1 year ago

I'm so sorry. You are correct. quinn encodes the msg_control data, but after the pointer was passed (which is probably why I got confused).

In this code, they set the control data. This files handles the encoding.

It seems like it would be beneficial for us to expose the entire hdr, instead of just the control. But I believe we don't do that to disallow invalid hdrs, since we set the IP there. What do you think we could design such an API?

FrankReh commented 1 year ago

Thanks for pointing me to that. cmsg.rs is very interesting.

It seems like it would be beneficial for us to expose the entire hdr

Just thinking about it with only a few minutes of understanding at this point.

A new, additional, version of sendmsg (sendmsg_hdr?) could be provided but how special purpose would it have to be?

It could be unsafe, requiring the caller created their arguments to the function correctly,

but I think we still want the ability to get smart pointer drops called if the future is dropped before the caller has received their results. (say the caller used a select! with a timeout so the sendmsg future results may not get returned to the caller)

So I don't this library could ask the caller to be responsible for the lifetimes of the three or four pieces of memory.

One rather straightforward way would be for the new function to take ownership of three Box values and either rebuild the msghdr from the three pieces, and a msg_flags parameter, or it takes four Box values. So requiring the caller to have boxed the data, but then pass ownership of the boxes to this API.

Even building msg_iov has to be done carefully though because it also has pointers that need to be safe for the duration of the operation and should also have smart drop functions at the ready if the future is cancelled (as described above).

Is having a new API essential to make this work for quinn or can quinn build part of its msghdr the way it does now and then pass the msg_control portion through the existing sendmsg function?

A big difference for quinn is that it should want to give up ownership of the data lifetime when it calls the io_uring sendmsg API, although clearly the liburing C API can't do things like that, but they don't have futures that might be cancelled by higher functions on the call stack either.

Just trying to slip in tokio_uring into a library that had gotten used to building data on its call stack is always going to be hard. No-one has found a way to make it safe from UB in the face of cancelled futures while still be efficient and not having to introduce boxes that weren't there before.

Icelk commented 1 year ago

Sorry for the very late response. Lately, I've implemented HTTP/3 and therefore use this code. I've opened another issue, #274. I now realize only the msg_control is important, so allowing the user control over iovec or the msg_name is redundant & unsafe, as you said. The current solution is very nice, indeed.

I've managed to make tokio-uring work with quinn, with some fiddling. I'll just link the following thread for posterity regarding quinn & tokio-uring compat: https://github.com/quinn-rs/quinn/issues/915