rust-lang / libc

Raw bindings to platform APIs for Rust
https://docs.rs/libc
Apache License 2.0
2.08k stars 1.04k forks source link

Problem in implementing siginfo_t for Linux #716

Open qinsoon opened 7 years ago

qinsoon commented 7 years ago

The definition of siginfo_t in Linux (https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/siginfo.h) uses a inlined union type as its last field. Rust did not have support for union types, I guess that was why those fields were left as a padding array in current libc implementation.

I made some attempts to use union (introduced in Rust 1.19) to represent siginfo_t. However the resulting struct is 8 bytes larger than the original one (the one currently in libc, which is the same size as the C struct type). See the code here: https://play.rust-lang.org/?gist=99630206fcc8fd44b452d22552b37793&version=stable

My guess is the 8 more bytes come from two places:

  1. the union type is 8 bytes aligned, but the first 3 fields of siginfo_t are 12 bytes (3x 4-bytes integer), so the padding is needed after the first 3 fields, which makes the offset of the union type at 16 bytes (instead of 12 bytes)
  2. because the union type is 8 bytes aligned, so the size of the union type cannot be 4*29 (which is not a multiple of the 8 bytes alignment), thus padding is needed at the end to make it a muliple of 8.

I am not sure why the C definition does not have issues like this (probably because it uses inlined union in the struct definition) and how to properly implement siginfo_t in Rust. Please correct me if I am wrong. For my project, I have to extract data from the padding array (which looks awful).

ndusart commented 7 years ago

Yes, I think the issue is that in C, the alignment is done for all the struct after union elements are inserted, then in C we do not align the union itself. I do not think we can do it in Rust.

We can force the union to be aligned in 4bytes by only using 4bytes fields or less, then for example *mut c_void become [u32; 2] for 64 bits architectures.

It is passing the test now, but it's not a clean solution. We have then to define the size of these arrays conditionally in function of the target while the types which can be longer than 4 bytes are already defined conditionnaly in libc.(when const fn will stabilize, we could use [u8; std::mem::size_of::<*mut c_void>()] though)

The other bad point is that users of libc would be forced to transmute these fields to access them as their correct type.

Maybe the solution would be to let this as is, and let higher-level crate as nix provides a safe interface for the padding bytes ?

qinsoon commented 7 years ago

Thanks for the reply. I was a bit surprised as I thought with the introduction of union types, it is always possible to declare identical types in Rust as in C. I don't see there is a neat way to deal with this case. Providing accessor to the padding bytes always adds overhead (though it doesn't matter in this case - signal handling).

Also nix is totally ignoring this part as well. As for extracting info from siginfo_t and ucontext_t, it seems more convenient to just write the code in C and call form Rust (which is as efficient, but is a bit awkward for a Rust project).

alexcrichton commented 7 years ago

Yeah right now the handling of siginfo_t upstream in rust-lang/rust is pretty nasty. Until we get unions this unfortunately isn't really gonna get much better.

vojtechkral commented 7 years ago

@qinsoon Marking the si_fields_t union as #[repr(C, packed)] instead of just #[repr(C)] makes it the same size as the original one. Is this correct / does this help?

gnzlbg commented 5 years ago

Anyone interested on working on this anytime soon? It appears that @vojtechkral suggest a way to solve this. I can mentor.

kornelski commented 5 years ago

Should this be in the 1.0 roadmap? #547

alex commented 5 years ago

I'd be interested in working on this, at least far enough to get si_addr :-) Would a patch that just added enough to read that be acceptable, or do you want this to solve this completely?

gnzlbg commented 5 years ago

@alex what do you mean by not solving this completely? the proposed change should just work, it will only need a bit of cfg-magic to be active on newer Rust versions.

alex commented 5 years ago

Not adding every single field :-)

gnzlbg commented 5 years ago

As long as the struct layout is correct, if the union field is private, we can do whatever we want (as long as we only use the union on toolchains that support it).

If the question is whether we can make the field a public union, and then add newer union fields incrementally over time, I haven't thought about that, but I think it's a good question. I've asked it here: https://github.com/rust-lang-nursery/api-guidelines/issues/196

alex commented 5 years ago

Ah yes, that does suggest a good question: do we want to just expose the underlying unions, or add methods on siginfo_t which access the underlying field?

gnzlbg commented 5 years ago

If C exposes the union we should do that. If C has functions to expose the fields, we can add those. I don't think C has methods, so doing that would be something more appropriate for the nix crate.

alex commented 5 years ago

In C the field containing a union has a name with an underscore _sifields, and then there are macros like #define si_addr _sifields._sigfault._addr. So you can do some_siginfo_t.si_addr (and you can't really have a field named si_addr on any other struct, but oh well).

On Fri, Apr 12, 2019 at 8:35 AM gnzlbg notifications@github.com wrote:

If C exposes the union we should do that. If C has functions to expose the fields, we can add those. I don't think C has methods, so doing that would be something more appropriate for the nix crate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/libc/issues/716#issuecomment-482557924, or mute the thread https://github.com/notifications/unsubscribe-auth/AAADBD10ePsPypLCiLRuXizwOZMTgjdXks5vgH2RgaJpZM4OxrDb .

-- All that is necessary for evil to succeed is for good people to do nothing.

gnzlbg commented 5 years ago

We typically map those macros to Rust functions, so I think we can just keep the field private (in which case it does not matter whether it is implemented using an union or not), and just provide functions to access the fields. We can probably make things a bit more concrete once we have a PR open with an initial proposal.

alex commented 5 years ago

I'll try to get to it later today!

On Fri, Apr 12, 2019, 9:07 AM gnzlbg notifications@github.com wrote:

We typically map those macros to Rust functions, so I think we can just keep the field private (in which case it does not matter whether it is implemented using an union or not), and just provide functions to access the fields. We can probably make things a bit more concrete once we have a PR open with an initial proposal.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/libc/issues/716#issuecomment-482567616, or mute the thread https://github.com/notifications/unsubscribe-auth/AAADBFnxwr7GrtUiL8FEVuVijnjFOtApks5vgIUegaJpZM4OxrDb .

gnzlbg commented 5 years ago

So the padding field of siginfo_t was not private as it should have been. This means that using an union instead would be a hard breaking change. Once we have a PR that implements the proposed change, we can try and see if there is any code out there actually relying on that, but this might be something that might need to wait until the next breaking release.

nbsdx commented 5 years ago

Figured I'd drop by and leave my 2 cents here as well since I'm looking into siginfo_t for a project.

The kernel's siginfo_t is NOT packed, and should not be declared as such in Rust (re: vojtechkral's comment). The siginfo_t structure looks like this in memory (x86_64) when there's a process that was/needs to be waited on:

|11000000 00000000 01000000 00000000| ................ 00000000
|d2620000 e8030000 00000000 00000000| .b.............. 00000010
|00000000 00000000 00000000 00000000| ................ 00000020
|00000000 00000000 00000000 00000000| ................ 00000030
|00000000 00000000 00000000 00000000| ................ 00000040
|00000000 00000000 00000000 00000000| ................ 00000050
|00000000 00000000 00000000 00000000| ................ 00000060
|00000000 00000000 00000000 00000000| ................ 00000070

The first 3 4-byte values are the fields si_signo, si_errno, and si_code. The 4th 4-byte field (all zeros) is padding. This padding is caused by the _sifields union having a member that needs to be 8-byte aligned on 64-bit systems (_sigfault and _sigsys union fields are the obvious culprits here, as the first members in those structs are pointers).

It looks like rust's union implementation needs to be fixed to inherit alignment requirements from its members, which doesn't sounds particularly difficult - just caclulate the required alignment for each, and use an alignment that satisfies all of them (which will usually just be the max alignment present). The packed attribute would just over-rule that alignment decision and append the union directly after the previous field. Edit: leaving this for posterity, but it's not correct - I was having some alignment issues of my own. After fixing those, the union alignment was working correctly.

In the rust playground link, the forced 8-byte alignment throws off the padding calculation (which is actually garbage pre linux 4.20). It doesn't take into account that the compiler will add padding between the 3 integers and the union on 64 bit systems, which makes the padding 4 bytes too long. This isn't an issue in libc right now because those fields in the union are omitted.

Recent versions of the linux kernel (since 4.20) have changed the definition of siginfo_t to be less confusing. It's now effectively:

struct siginfo_t {
  union {
    int padding[128 / sizeof(int)]; // int padding[32]
    struct {
      int si_signo;
      int si_errno;
      int si_code;
      union __sifields _sifields;
    }; // anonymous union field
  };
};

Which is much clearer (despite that it could have been better described as a union at the top level...). This removes the bizarre pad: [libc::c_int; 29], and in general helps to restructure something that was rather bizarre.

So it would probably be better just to update the siginfo_t definition in rust-lang/libc to reflect the new definition in the kernel. This could be done in another module to prevent breaking existing code that relies on the current siginfo_t, so long as there's no issue with having two definitions present (fracturing generally isn't good, but if you want to not introduce a breaking change it may be worth it).

vojtechkral commented 5 years ago

@nbsdx Thanks for investigating. FWIW I have no memory of that comment I wrote and, more specifically, why I thought so :)

nbsdx commented 5 years ago

@vojtechkral no problem, packing it was my initial solution as well until I dug into it more once I wasn't able to access fields properly :)