rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.37k stars 12.59k forks source link

Tracking Issue for feature(unix_socket_ancillary_data) #76915

Open LinkTed opened 4 years ago

LinkTed commented 4 years ago

Tracking issue for feature(unix_socket_ancillary_data) to extend UnixStream and UnixDatagram to send and receive ancillary data. For example file descriptors.

Unresolved Questions

Implementation history

PR #69864, #82589, #83374

tadeokondrak commented 3 years ago

send_vectored_with_ancillary should switch to non-mutable references for its parameters, since sendmsg doesn't write through them.

https://github.com/rust-lang/rust/pull/79753 changes just the bufs parameter, but it was closed for inactivity.

(Edit: Hmm, actually changing the ancillary data too would require both a SocketAncillary and SocketAncillaryMut, like IoSlice. Probably not worth it there)

LinkTed commented 3 years ago

@tadeokondrak I open a new PR #82589.

LinkTed commented 3 years ago

I create a new PR to send and receive TTL for UdpSocket from the ancillary interface. #82858

reyk commented 3 years ago

Hi @LinkTed, I posted a fix in https://github.com/rust-lang/rust/pull/83374 for macOS, OpenBSD, and possibly other BSDs.

LinkTed commented 3 years ago

What has to be done to stabilize this API?

the8472 commented 3 years ago

I think the API still needs polish. All those niche socket features are tricky and right now it's probably not extensible enough for 3rd party libraries to fill platform-specific gaps such as flags or cmsgs only supported on one particular OS. These basically are wrappers for the recvmsg and sndmsg syscalls, we should expose them in a way that's forwards-compatible so we don't have to add recv_vectored_with_ancillary_from_with_flags or something like that in the future.

Before stabilizing we should consider whether recv_vectored_with_ancillary_from should gain another argument. I would recommend adding a ReceiveOptions argument that allows one to specify flags, similar to OpenOptions. The options could then be used to set linux' MSG_DONTWAIT, MSG_PEEK or MSG_ERRQUEUE for example, maybe via custom_flags() similar to OpenOptions. That way we don't have to support all OS-specific peculiarities.

Or we could put the flags into SocketAncillary or at least reserve the right to do that in the future, but in that case the struct would need to be renamed because it would serve several purposes at once.

Similar could be done for sending.

Additionally the AncillaryData enum should be marked as #[non_exhaustive] and maybe gain a Unknown variant for things that the standard library doesn't support. And a way to get the raw ancillary message. That way one could for example decode struct sock_extended_err in a 3rd party crate without waiting for std to implement it.

LinkTed commented 3 years ago

@the8472 Thank you for your feedback. If this PR request is #82858 merged I will start implementing your suggestions.

tv42 commented 3 years ago

What about CMSG_SPACE? Without it, we're forced to either guess or probe the appropriate buffer size.

https://man7.org/linux/man-pages/man3/cmsg.3.html SCM_RIGHTS example shows how it's used to compute the correct buffer size (with some tricky business wrt alignment).

LinkTed commented 3 years ago

Hi @tv42, what are the difference between CMSG_SPACE and CMSG_LEN?

Is there a function, which calculate the alignment for a struct to a specific address in Rust?

comex commented 3 years ago

what are the difference between CMSG_SPACE and CMSG_LEN?

CMSG_SPACE includes alignment bytes, CMSG_LEN doesn't.

Is there a function, which calculate the alignment for a struct to a specific address in Rust?

I don't know what this means.

LinkTed commented 3 years ago

what are the difference between CMSG_SPACE and CMSG_LEN?

CMSG_SPACE includes alignment bytes, CMSG_LEN doesn't.

Is there a function, which calculate the alignment for a struct to a specific address in Rust?

I don't know what this means.

I thought that the alignment depends on the address, where you want to store the struct.

zopsicle commented 3 years ago

What it means for CMGS_SPACE to include “alignment bytes” is that it rounds the size up to include a number of padding bytes to make sure the next header is properly aligned (to the alignment of cmsghdr).

LinkTed commented 3 years ago

So, the address have to be a multiple of the return value of CMSG_ALIGN, right?

zopsicle commented 3 years ago

The address of the cmsghdr object has to be a multiple of std::mem::align_of::<cmsghdr>().

LinkTed commented 3 years ago

Sorry, that I ask again. Does the macro CMSG_ALIGN the same as std::mem::align_of::<cmsghdr>()?

zopsicle commented 3 years ago

CMSG_ALIGN(len) rounds len up to a multiple of std::mem::align_of::<cmsghdr>(). For instance, if the required alignment is 8, and len is 11, CMSG_ALIGN(len) will return 16.

As an example, consider the following two headers:

let A = cmsghdr{ cmsg_len = 19, cmsg_level = X, cmsg_type = Y };
let B = cmsghdr{ cmsg_len = 22, cmsg_level = W, cmsg_type = Z };

If cmsg_len were not rounded up, this sequence of headers would look like this in memory:

          ~~~~~~~~~A~~~~~~~~~ ~~~~~~~~~B~~~~~~~~~
         +----+----+----+----+----+----+----+----+
Address: |  0 |  8 | 12 | 16 | 19 | 27 | 31 | 35 |
         +----+----+----+----+----+----+----+----+
Data:    | 19 |  X |  Y |  _ | 22 |  W |  Z |  _ |
         +----+----+----+----+----+----+----+----+

Notice how header B is at improperly aligned address 19, which is not a multiple of the alignment of cmsghdr.

By rounding up the lengths (assuming an alignment of 8 for this example), we get this:

          ~~~~~~~~~A~~~~~~~~~      ~~~~~~~~~B~~~~~~~~~
         +----+----+----+----+----+----+----+----+----+----+
Address: |  0 |  8 | 12 | 16 | 19 | 24 | 32 | 36 | 40 | 46 |
         +----+----+----+----+----+----+----+----+----+----+
Data:    | 19 |  X |  Y |  _ |    | 22 |  W |  Z |  _ |    |
         +----+----+----+----+----+----+----+----+----+----+
                              ~~~~ padding bytes       ~~~~ padding bytes

Now, every header and each field in every header is at the aligned address it should be.

LinkTed commented 3 years ago

Thanks, @chloekek. Currently, the code is using CMSG_LEN, which includes the alignment bytes. So, technically we have to ensure that the first cmsg struct is properly aligned, right?

But, if cmsg_len is rounded up, this means that the receiver interpreted the alignment bytes as data. For example, if the sender set 3 bytes, which then is aligned to 8, the receiver will think that he got 8 bytes. How do we handle this?

zopsicle commented 3 years ago

Actually, the man page says that you should include the padding in cmsg_len:

CMSG_LEN() returns the value to store in the cmsg_len member of the cmsghdr structure

This means that the receiver will indeed interpret those bytes as part of the data, I don’t think there’s any way around that?

This would mean that, if alignof(cmsghdr) = 8, and sizeof(int) = 4, you could only send an even number of file descriptors using SCM_RIGHTS. :sweat_smile: I guess if you want to send an odd number of file descriptors you could program the sender to set the odd one to -1 and program the receiver to ignore any -1s.

LinkTed commented 3 years ago

Interesting, currently there is a test, which send one FD and I tested on a 64-bit system, where the alignment of cmsghdr should be 8. 😀

zopsicle commented 3 years ago

What is suspicious is that CMSG_NXTHDR adds the alignment if necessary.

This could mean that either:

Note that while you could test this, other Unix implementations (e.g. BSD) may behave differently from Linux in this regard. It would be best to check the Official Spec™ (POSIX?).

As for file descriptors, the kernel will indeed divide the data length by the size of a file descriptor in order to compute how many file descriptors to send. And it seems that you indeed do have to set all of them to valid file descriptors, not to -1.

LinkTed commented 3 years ago

I think the man page is wrong. CMSG_LEN does not include the alignment. Only CMSG_SPACE as @comex suggested.

zopsicle commented 3 years ago

So to summarize:

LinkTed commented 3 years ago

So, if CMSG_NXTHDR returns an aligned address, we have to ensure that the first cmsghdr is aligned because then any subsequent cmsghdr is aligned. Technically, we do not have to use CMSG_SPACE.

LinkTed commented 3 years ago

By the way @chloekek. Here is the commit, where the CMSG_ALIGN were removed.

zopsicle commented 3 years ago

The SocketAncillary::new function currently takes any old &mut [u8], regardless of alignment. Do we want to enforce its alignment? It could be made safe with an assert!ion on the address of the slice, but that would be rather annoying to use for users, as they would have to jump through hoops to make sure their buffer is properly aligned.

LinkTed commented 3 years ago

Yes, therefore I would add alignment into the SocketAncillary::new function.

zopsicle commented 3 years ago

Rounding up the address of the slice in SocketAncillary::new would mean that the actual length of the buffer can be up to 7 less than the length provided by the user (assuming an alignment of 8). Is this something we want? It is surprising behavior that would have to be documented well.

An alternative solution would be to provide data types for aligned buffers:

pub union SocketAncillaryBuf<const N: usize> {
    // same trick as suggested in cmsg(3) man page
    _align: cmsghdr,
    buf: [u8; N],
}

pub struct SocketAncillarySlice<'a> {
    // *buf is always aligned to cmsghdr, because we only export constructors that ensure that.
    buf: &'a mut [u8],
}

impl<const N: usize> SocketAncillaryBuf<N> {
    pub fn as_slice(&mut self) -> SocketAncillarySlice {
        unsafe { SocketAncillarySlice::from_aligned_slice(&mut self.buf) }
    }
}

impl<'a> SocketAncillarySlice<'a> {
    pub unsafe fn from_aligned_slice(buf: &'a mut [u8]) -> Self {
        Self{buf}
    }

    /// Warning: will shrink buf to fit alignment requirements if necessary.
    pub fn from_unaligned_slice(buf: &'a mut [u8]) -> Self {
        // adjust pointer of buf
    }
}
LinkTed commented 3 years ago

Unfortunately, this would not work if the user wants to use a dynamically allocated buffer, like Vec<u8>.

LinkTed commented 3 years ago

Rounding up the address of the slice in SocketAncillary::new would mean that the actual length of the buffer can be up to 7 less ..

Unfortunately, yes 😞

tv42 commented 3 years ago

Rounding up the address of the slice in SocketAncillary::new would mean that the actual length of the buffer can be up to 7 less than the length provided by the user (assuming an alignment of 8). Is this something we want? It is surprising behavior that would have to be documented well.

Isn't this what CMSG_SPACE is for, to know how big of a buffer you need to allocate to fit the given amount of payload.

Oh I see, this seems to about allocating at the right alignment. Yeah, you can't just secretly ignore some bytes from the beginning, because for typical use of cmsg(3), you pass in the exact right amount of bytes; that'd only give it e.g. 1/8th of a chance of working.

tv42 commented 3 years ago

So, if CMSG_NXTHDR returns an aligned address, we have to ensure that the first cmsghdr is aligned because then any subsequent cmsghdr is aligned. Technically, we do not have to use CMSG_SPACE.

The purpose of CMSG_SPACE is to know how large a buffer to allocate. Not to assist in filling it.

tv42 commented 3 years ago

I think the road forward is going to look something like

        let num_fds = 7;
        let inner_size = (std::mem::size_of::<libc::c_int>() * num_fds) as libc::c_uint;
        let size = unsafe { libc::CMSG_SPACE(inner_size) } as usize;
        let align = std::mem::size_of::<libc::cmsghdr>() as usize;
        let align = align.next_power_of_two();
        let layout = std::alloc::Layout::from_size_align(size, align)
            .expect("alignment math for cmsghdr");
        let ptr = unsafe { std::alloc::alloc(layout) };
        let buf = unsafe { std::slice::from_raw_parts_mut(ptr, size) };

Wrapping that in a function in std::os::unix::net sounds like the right idea.

Maybe one for FDs, another for arbitrary data (taking just byte count, not FD count).

tv42 commented 3 years ago

Err, lifetimes won't let you return a slice reference. Maybe returning Box<[u8]>? This is me running into limits of my knowledge of Rust idioms.

tv42 commented 3 years ago

Something like this maybe:

pub struct SocketAncillary<'a> {
    buffer: &'a mut [u8],
    length: usize,
    truncated: bool,
}

impl<'a> SocketAncillary<'a> {
    pub fn new(buffer: &'a mut [u8]) -> Self {
        SocketAncillary {
            buffer,
            length: 0,
            truncated: false,
        }
    }

    pub fn with_capacity(size: usize) -> Self {
        let size = unsafe { libc::CMSG_SPACE(size as u32) } as usize;
        let align = std::mem::size_of::<libc::cmsghdr>() as usize;
        let align = align.next_power_of_two();
        let layout =
            std::alloc::Layout::from_size_align(size, align).expect("alignment math for cmsghdr");
        let ptr = unsafe { std::alloc::alloc(layout) };
        let buffer = unsafe { std::slice::from_raw_parts_mut(ptr, size) };
        Self::new(buffer)
    }

    pub fn with_capacity_for_fds(num: usize) -> Self {
        let size = std::mem::size_of::<libc::c_int>() * num;
        Self::with_capacity(size)
    }
}

fn main() {
    let s = SocketAncillary::with_capacity(42);
    println!("{:?}", s.buffer);
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=54898fddcec1ec9673c2658b92c3c8f1

comex commented 3 years ago
       let align = std::mem::size_of::<libc::cmsghdr>() as usize;`
       let align = align.next_power_of_two();

I wouldn't do this. In practice this value should be enough, because on all of the implementations I've seen (Linux, Darwin, OpenBSD, Windows), the alignment that CMSG_ALIGN aligns to is <= 8, while the size of cmsghdr is >= 12. But that's sort of a coincidence. There's no real reason to think the size of cmsghdr has any relationship to the required alignment, especially given that platforms other than Linux still have macros such as CMSG_SPACE use CMSG_ALIGN(sizeof(struct cmsghdr)) (or similar) rather than just sizeof(struct cmsghdr). (And on OpenBSD at least this actually makes a difference, since sizeof(struct cmsghdr) is always 12 while the required alignment is 8 on 64-bit targets. In your proposed version, next_power_of_two() would cover that, ending up at a requested alignment of 16, but again, that's sort of a coincidence.)

On platforms which define CMSG_ALIGN, the required alignment should be equal to CMSG_ALIGN(0). Some platforms don't define it, but… since the libc bindings are maintained by hand anyway, I suggest just inspecting the definition of CMSG_SPACE on each platform to see what value it aligns to.

zopsicle commented 3 years ago

I think CMSG_ALIGN is only useful in the following usage:

let address_of_previous_cmsghdr: *const cmsghdr = &...;
let address_of_next_cmsghdr: *const cmsghdr =
    address_of_previous_cmsghdr.offset(CMSG_ALIGN(length_of_previous_cmsghdr));

And this usage is already encapsulated in the CMSG_NXTHDR macro.

N.B. I think the correct way to read the name CMSG_ALIGN is to think of “ALIGN” as a verb: it aligns the given value (by rounding it up). It does not return “the alignment”. I guess CMSG_ALIGN(0) would be equivalent to that, though. :')


If you want to know the alignment for cmsghdr, can’t you just use std::mem::align_of::<cmsghdr>()? That should always give the correct answer. This is also what the union trick in the cmsg(3) man page implicitly relies on (i.e. it relies on rule that the alignment of a union type is the maximum alignment of its field types).

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c2ef761bc1f42a99151c36c0ea535c9f

zopsicle commented 3 years ago

If you want to heap-allocate the buffer, then this cannot be done by SocketAncillary itself, as SocketAncillary borrows the buffer; it does not own the buffer and so must not hold on to heap-allocated memory, as it would leak.

So if we want to offer an API for aligned buffers, it should be a separate structure:

pub struct SocketAncillaryBoxedBuf {
    // INVARIANT: Address is aligned to `align_of::<cmsghdr>()`.
    inner: Box<[u8]>,
}
impl SocketAncillaryBoxedBuf {
    pub fn with_capacity(size: usize) -> Self { /* aligned alloc here */ }
}

And replace SocketAncillary::new with:

impl<'a> SocketAncillary<'a> {
    pub fn from_boxed_buf(buffer: &mut SocketAncillaryBoxedBuf) -> Self { ... }
}

let mut example = SocketAncillaryBoxedBuf::with_capacity(12); // owns the memory
let mut ancillary = SocketAncillary::from_boxed_buf(&mut example); // borrows the memory

Then it would also be easy to add an interface for stack-allocated buffers:

pub union SocketAncillaryArrayBuf<const N: usize> {
    _align: cmsghdr,
    inner: [MaybeUninit<u8>; N],
}
impl<'a> SocketAncillary<'a> {
    pub fn from_array_buf<const N: usize>(buffer: &mut SocketAncillaryArrayBuf<N>) -> Self { ... }
}

let mut example = SocketAncillaryArrayBuf<8>::new();
let mut ancillary = SocketAncillary::from_array_buf(&mut example);

This is getting quite complex for a simple feature. :sweat_smile: Perhaps there should be some more generic aligned buffer facility (a la C++’s std::aligned_storage), but that would probably be part of a whole other RFC.


Come to think of it, can’t we just take a [MaybeUninit<cmsghdr>] slice instead of a [u8] slice? pub fn new(buffer: &'a mut [MaybeUninit<cmsghdr>]) -> Self. It will always be aligned properly, and the use of MaybeUninit ensures that the user can’t just treat it as proper cmsghdr data (as it may be the payload data, or just not yet initialized), and it is easy to allocate both on the stack and on the heap.

LinkTed commented 3 years ago

And how can the user choose the length of the buffer (as bytes)?

comex commented 3 years ago

If you want to know the alignment for cmsghdr, can’t you just use std::mem::align_of::<cmsghdr>()? That should always give the correct answer. This is also what the union trick in the cmsg(3) man page implicitly relies on (i.e. it relies on rule that the alignment of a union type is the maximum alignment of its field types).

I am pretty sure the trick suggested in the Linux man page is actually wrong – at least morally speaking. The alignment of cmsghdr is not necessarily sufficient. For example, on OpenBSD (and FreeBSD, which I just verified does the same as OpenBSD here), cmsghdr is defined as

struct cmsghdr {
    socklen_t   cmsg_len;
    int     cmsg_level;
    int     cmsg_type;
};

socklen_t is defined as uint32_t, so struct cmsghdr always has a size of 12 and alignment of 4.

However, the CMSG_SPACE, CMSG_LEN, and CMSG_ALIGN macros all use _ALIGN:

#define CMSG_SPACE(l)       (_ALIGN(sizeof(struct cmsghdr)) + _ALIGN(l))
#define CMSG_LEN(l)     (_ALIGN(sizeof(struct cmsghdr)) + (l))
#define CMSG_ALIGN(n)   _ALIGN(n)

and at least on x86-64, _ALIGN rounds to 8 bytes:

typedef __int64_t   __register_t;
#define _ALIGNBYTES (sizeof(__register_t) - 1)
#define _ALIGN(p)   (((__uintptr_t)(p) + _ALIGNBYTES) & ~_ALIGNBYTES)

Now, technically, the value that the sizes are aligned to doesn't have to be the same as the value the pointers are aligned to. In practice, I believe (though haven't verified) that neither the BSDs nor Linux care about pointer alignment for cmsgs at all. On all of those systems, recvmsg, sendmsg, etc. are implemented as system calls. And on all of those systems, with cmsgs, like in most situations where userland passes pointers to the kernel, the kernel doesn't access the userland memory directly, but allocates its own buffer and effectively memcpys the data from/to userland as the first/last step. And the kernel's allocated buffer has whatever alignment the kernel allocates it with, unrelated to the userland buffer's alignment.

But in theory those functions could be implemented in such a way that alignment matters, e.g. purely in userland, or in the kernel but with some mechanism where the kernel directly maps userland pages. And while this is not explicitly stated, I would assume the pointer should be aligned to the same value the sizes are, because the whole point of aligning the sizes is to ensure that pointers stay aligned.

Unfortunately, there's no spec language to check this assumption against. The POSIX man page for sys/socket.h describes cmsghdr and some of the CMSG_* macros (namely, CMSG_DATA, CMSG_NXTHDR, and CMSG_FIRSTHDR), but it doesn't even mention CMSG_SPACE and CMSG_LEN, and it does not mention alignment at all. Nor are those mentioned in the POSIX man pages for recvmsg and sendmsg.

LinkTed commented 3 years ago

Even if the function recvmsg and sendmsg is implemented in such a way that alignment does not matter, should be the cmsghdr be aligned, so that the user can access it? For example, to read the FDs.

To be honest, I do not see any big issue if the user allocates BUFFER_SIZE + 7 bytes to ensure that there are enough bytes.

tv42 commented 3 years ago

Another concern: Nothing seems to allow access to recvmsg(2) argument flags, or the similar return side msghdr.msg_flags.

Of these, specially interesting are

LinkTed commented 3 years ago
tv42 commented 3 years ago

Ah, I was reading https://doc.rust-lang.org/stable/std/os/unix/net/struct.UnixStream.html#method.recv_vectored_with_ancillary and https://doc.rust-lang.org/nightly/std/os/unix/net/struct.UnixStream.html#method.recv_vectored_with_ancillary which are just -> Result<usize>. The _from variant is only on UnixDatagram. What I'm actually using is SOCK_SEQPACKET, because I do want a connection.

LinkTed commented 3 years ago

Is the SOCK_SEQPACKET supported by Rust's std, currently?

the8472 commented 3 years ago

No, there was #50348 but it was recommended to implement it in an external crate first.

That said if https://github.com/rust-lang/rust/issues/76915#issuecomment-820682675 is addressed then I think one could make one of the unix sockets dual-purpose (only the method of construction would differ) since the EOR flag could be accessed through the extended API.

tv42 commented 3 years ago

The semantics of SOCK_SEQPACKET depend on the domain. For UNIX domain sockets, they connect like SOCK_STREAM and send/receive like a (pseudo-connected) SOCK_DATAGRAM. If you FromRawFd an existing SOCK_SEQPACKET fd into a UnixDatagram, it works just fine.

beroal commented 3 years ago

Hi. I have some remarks on the API.

beroal commented 3 years ago

Should the buffer passed as the argument to the new method be aligned? If yes, how can a caller allocate such a buffer?

It seems that the add_to_ancillary_data function writes a control message over an existing one in the buffer pointed by the msg_control field of struct msghdr. This is not right. This buffer can contain several control messages.

I believe that using the CMSG_NXTHDR macro when writing ancillary data is buggy on Linux. See my question on StackOverflow.

beroal commented 3 years ago

Here is my adaptation of a diagram by Stevens et al. (396) which shows the layout of the buffer pointer by the msg_control field of struct msghdr.

Stevens, Richard W., et al. UNIX Network Programming: The Sockets Networking API. 3rd ed., vol. 1, Addison-Wesley, 2004.

unix-socket-ancillary-data

LinkTed commented 3 years ago
* As far as I understand, the methods of `std::os::unix::net::SocketAncillary` read from and write to a buffer passed to its method `new`, and this buffer is external to `SocketAncillary`. It would be nice to have a method of calculating the size of this buffer in the API. As I understand, the `len` method is not useful here. Since the buffer isn't extensible, the caller needs to calculate its size before writing any data into it. In C, the buffer size is calculated as the sum of `CMSG_SPACE(size(i))` for all control messages `i`. `size(i)` is the size of the payload of the control message `i`. `size(i) == N*sizeof(int)` if `cmsg_type` of the control message equals `SCM_RIGHTS` and the control message contains `N` file descriptors.

We can add such a function. Do you have any thought how the function should be named?

* Am I correct that the `add_fds` method adds exactly one control message? It would be nice to clarify that in the API documentation. The same question for the `add_creds` method.

Yes, you are correct that add_fds adds a control message. Can you make a suggestion, what should be added to the documentation?