rust-lang / rust-bindgen

Automatically generates Rust FFI bindings to C (and some C++) libraries.
https://rust-lang.github.io/rust-bindgen/
BSD 3-Clause "New" or "Revised" License
4.42k stars 693 forks source link

Generated tests use UB in their calculations #1651

Closed Lokathor closed 2 years ago

Lokathor commented 4 years ago

Test code for struct offsets run checks like this:

    assert_eq!(
        unsafe { &(*(::core::ptr::null::<timeval>())).tv_sec as *const _ as usize },
        0usize,
        concat!(
            "Offset of field: ",
            stringify!(timeval),
            "::",
            stringify!(tv_sec)
        )
    );

But it's UB to dereference a null pointer, and it's also UB to create a &T with numeric value 0.

There is actually trivial to fix. Instead of using a null pointer, just create a value on the stack (using core::mem::zeroed(), which is safe since all C structs can be created as zeroed-memory), and then use a reference to that stack value, the fields of that stack value, and so on.

emilio commented 4 years ago

This is known, but thanks for filing. I've always hoped that rust would have an offsetof macro, but... ;)

Creating structs on the stack with zeroed memory is ok for C, but not so sure for C++, where what bindgen may generate is not POD, or may have destructors or other side effects.

Lokathor commented 4 years ago

Can go by if the type is Copy or not. Also can use ManuallyDrop if the Rust target is new enough.

I'm not sure what most of the point of those tests is anyway, most of them can't ever fail so it seems kinda silly.

emilio commented 4 years ago

How not? Those tests are important, if bindgen generates something that has the wrong size or alignment or anything, those tests catch it.

Lokathor commented 4 years ago

Replied in a separate issue because it's not really related to this "offset calculation is UB" issue.

gnzlbg commented 4 years ago

@Lokathor whatever you come up with will be UB anyways, because rust-bindgen needs to compute the offset of #[repr(packed)] struct fields, and there is no way to do that in Rust without UB.

RalfJung commented 4 years ago

@gnzlbg that is true, but it has nothing to do with packed. You can't do it for non-packed struct either, and once we have fixed that (using &raw) the fix will work for packed structs, too.

https://github.com/Gilnaa/memoffset/ implements the currently best way we have to do this. The most important part is to avoid NULL references; you could use NonNull::dangling instead.

RalfJung commented 4 years ago

I've always hoped that rust would have an offsetof macro, but... ;)

Given that library solutions cannot (it seems) soundly support nested fields and arrays, I think indeed this should become a built-in macro. But I don't have a thick enough skin for the syntax bikeshed that will undoubtedly trigger. ;)

Until then, we have that macro as a library.

gnzlbg commented 4 years ago

You can't do it for non-packed struct either, and once we have fixed that (using &raw) the fix will work for packed structs, too.

Notice that rust-bindgen derives Default for its types, so at least when it does that it could actually construct a value of a type by using Default::default and then compute the field offsets on that value without any null dereferences. That computation still requires creating a reference to a field, and since those fields can be unaligned for #[repr(packed)] structs, that would still be UB.

Lokathor commented 4 years ago

@gnzlbg why did i get pinged?

gnzlbg commented 4 years ago

Because you proposed a solution:

There is actually trivial to fix. Instead of using a null pointer, just create a value on the stack (using core::mem::zeroed(), which is safe since all C structs can be created as zeroed-memory), and then use a reference to that stack value, the fields of that stack value, and so on.

so I thought you might work on it, but that solution doesn't work, because of repr(packed).

I think the mem::zeroed part happens to work because rust-bindgen never generates types (at least by default, or at least it shouldn't) for which it doesn't work, but that's not the only source of UB.

Lokathor commented 4 years ago

Oh, well (1) I'm not going to work on this myself I'm just reporting the problem, (2) bindgen knows when it will emit repr(packed) so just don't do the simplistic version in that case (3) right the main problem is dereferencing a null pointer.

gnzlbg commented 4 years ago

(2) bindgen knows when it will emit repr(packed) so just don't do the simplistic version in that case

What should it do in that case then?

(3) right the main problem is dereferencing a null pointer.

The main problem is undefined behavior, but there are no solutions without undefined behavior.


There is nothing for rust-bindgen to do here: whatever it does, it will be UB, and someone will open an issue to report that, and maybe change that, into something else that also has UB.

This should either be "blocked on" some rust-lang/rust tracking issue, or closed as "non-actionable", but right now there is no way to fix on rust-bindgen's side.

Lokathor commented 4 years ago

Structs that aren't repr(packed) can be fixed.

Further, structs that are repr(packed) can still at least be tested against a zeroed value on the stack instead of on a null pointer and at least eliminate one of the problems, even if it's not every problem being solved.

It's okay to solve part of a problem at a time.

gnzlbg commented 4 years ago

I don't see the point yet.

If this is a problem because it is UB, there are currently not solutions to this problem. But if "UB is ok as long as it works", then the current code, which works, is as good as anything else. Time would be better spent into trying to provide a way to do this without any UB, but that cannot happen here.

Lokathor commented 4 years ago

Well, most structs aren't repr(packed), so actually it can happen here.

gnzlbg commented 4 years ago

Well, most structs aren't repr(packed), so actually it can happen here.

Even for non repr(packed) structs, this would still be UB if the type has zero as a niche, which I don't think happens by default (except maybe for C++), but happens often when using the opaque-type feature, for example.

Lokathor commented 4 years ago

...okay but you get my point that this can be fixed for the vast majority of C struct bindings I hope.

gnzlbg commented 4 years ago

I disagree with vast majority (all C bindings I work on use repr(packed)), but yes, for some subset of C bindings, some of rust-bindgen tests could become UB free.

RalfJung commented 4 years ago

@gnzlbg there's still a big difference between "theoretical UB" (that works because it's spec-only UB but we currently -- as of the latest nightly rustc -- never exploit it) and UB that rustc/LLVM might actually exploit, but they happen to not do that in the cases you checked (like NULL references).

I don't understand why you are fighting back so hard against what is clearly a strict improvement in language compliance. Why this black-and-white attitude?

gnzlbg commented 4 years ago

This issue reports that the generic code that bindgen uses is broken, but such code can’t be fixed, and this issue can’t be closed. The code has currently UB in the spec and in LLVM, and the proposed “fix” does so as well. We all agree that this reliance on UB is brittle, and Id rather not change something that’s known to work for bindgen users to something that doesn’t really fixes the issue and isn’t known to work at the bindgen scale (many more crates use bindgen than offsetof, reliance on the behavior of UB on many different toolchain versions, etc.). This also only impacts UB in

[test]s, and that code is not even compiled when using bindgen libraries

as dependencies.

Once &raw is available, fixing this issue would be a no brainer, but right now there are IMO dozens of other open issues with a better value proposition than what putting any work on this one would deliver.

RalfJung commented 4 years ago

So, let me rephrase: "if it can't be perfect (doesn't really fix the issue) then we don't care how awful it is".

many more crates use bindgen than offsetof

offsetof is used in crossbeam, so if you really want to compare numbers... memoffset recent downloads: 1,592,597 bindgen recent downloads: 310,642

But also, this is clearly not leading anywhere, so I'll just stop here.

gnzlbg commented 4 years ago

"if it can't be perfect (doesn't really fix the issue) then we don't care how awful it is".

That's not what I said.

Lokathor commented 4 years ago

Update: there is a fully sound offset_of! macro available in the bytemuck crate: https://docs.rs/bytemuck/1.2.0/bytemuck/macro.offset_of.html

The "catch" is that you have to pass in an instance of the type to the macro. So either the type needs to have default or makeable with all 0s or something like that.

anp commented 3 years ago

Would the addition of addr_of make this easier to implement without UB?

FYI, the recent addition of a new lint for deref'ing a null ptr surfaces this:

error: dereferencing a null pointer
  --> ../../src/lib/usb_bulk/rust/src/usb.rs:59:19
   |
59 |         unsafe { &(*(::std::ptr::null::<usb_ifc_info>())).dev_vendor as *const _ as usize },
   |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this code causes undefined behavior when executed
   |
   = note: `-D deref-nullptr` implied by `-D warnings`
emilio commented 3 years ago

Perhaps? But for that we'd also need to create an instance of the type, which we might not know how to create, and which may have other side effects... Can we get an address into a MaybeUninit::<NonTrivial>::uninit() without it being UB somehow?

I don't see how off-hand but that doesn't necessarily mean it doesn't exist... In any case I think we'd still be working around the lack of a standard offset_of! macro...

Lokathor commented 3 years ago

any C type can be constructed of zeroed memory at least.

emilio commented 3 years ago

Not if you implement Drop on the rust side or what not, which might be a reasonable thing to do.

Lokathor commented 3 years ago

well, okay, then forget the value at the end.

jyn514 commented 3 years ago

memoffset::offset_of! does this with uninitialized():

macro_rules! offset_of {
    ($father:ty, $($field:tt)+) => ({
        let root: $father = unsafe { $crate::mem::uninitialized() };

        let base = &root as *const _ as usize;
        let member =  &root.$($field)* as *const _ as usize;

        $crate::mem::forget(root);

        member - base
    });
}

https://docs.rs/memoffset/0.2.0/src/memoffset/offset_of.rs.html#68-79

Apostoln commented 3 years ago
Please ignore this message I'm going to continue the discussion from the dup issue #2049 The issue is a line in the tests like this: `rust unsafe { &(*(::std::ptr::null::())).field as *const _ as usize } ` which causes some problems, especially if the T is `#[repr(packed)]`. Let's try to split these problems: 1. The Problem 1: There is an UB about unaligned reference to packed field. 2. The Problem 2: There is an UB about dereferencing of null pointer. 3. The Problem 3: There are two new warnings on nightly about the two problems described above - #[warn(unaligned_references)] and #[warn(deref_nullptr)]. The next stable release will have these warnings, and one of the future releases will treat them as errors. This is a new nuance that has appeared since the creation of the initial issue. Let's think about how `offset_of` implemented in `memoffset`. Currently the implementation is: ```rust #[macro_export(local_inner_macros)] macro_rules! offset_of { ($parent:path, $field:tt) => {{ // Get a base pointer (non-dangling if rustc supports `MaybeUninit`). _memoffset__let_base_ptr!(base_ptr, $parent); // Get field pointer. let field_ptr = raw_field!(base_ptr, $parent, $field); // Compute offset. _memoffset_offset_from_unsafe!(field_ptr, base_ptr) }}; } ``` Basically, the algorithm is following: ```rust let uninit = std::mem::MaybeUninit::::uninit(); let base_ptr: *const T = uninit.as_ptr(); let field_ptr = unsafe { std::ptr::addr_of!((*(base_ptr)).field) }; (field_ptr as usize) - (base_ptr as usize) ``` This code compiles and runs successfully without any warnings or errors. There no dereferencing of null ptr or dangling ptr so there no UB in this part. But there _is_ an UB in case of `#[repr(packed)]` according to https://doc.rust-lang.org/stable/std/ptr/macro.addr_of.html So, with this approach (or using memoffset itself) we can avoid the UB in case of aligned structs, i.e. `The Problem 2`, and warnings/errors, i.e. `The Problem 3`. We still can't resolve `The Problem 1` because it's fundamentally insoluble now, but in this case we lose nothing - the problem is exactly the same as before, but the code is clearer. So I suggest to use `memoffset` or to use approach with MaybeUninit+ptr::addr_of! Please correct me if I'm wrong.
Apostoln commented 3 years ago

Please ignore my previous comment. Dereferencing of unitialized memory is also an UB, of course. The worst is this kind of UB is unsound in terms of warnings.

I suggest another way. The example is a bit drafty but it works. There no dereferencing of any kind of bad pointers, there no problems with unaligned references (see docs of std::ptr::addr_of), and of course there no warnings.

There is only one potential UB - if the type Foo or any of its fields can't have a correct zero-bit-pattern, like NonZero or NonNull. However, we ase using bindgen only for generation the Rust structs from C structs, therefore it's impossible.

use std::mem;
use std::ptr;

#[repr(packed)]
#[derive(Debug, Copy, Clone)]
struct Foo {
    boolean: bool,
    num: u32,
}

fn main() {
    const SIZE: usize = mem::size_of::<Foo>();
    let buffer = [0u8; SIZE];
    let foo = unsafe {
        std::mem::transmute::<[u8; SIZE], Foo>(buffer)
    };
    let foo_ptr = &foo as *const Foo;

    let num_ptr = unsafe {
        ptr::addr_of!((*(foo_ptr)).num)
    };

    let offset = num_ptr as usize - foo_ptr as usize;

    dbg!(offset);
}
Lokathor commented 3 years ago

Yeah, for types that allow an all zeroed value, we can construct the value from zeroed memory. since we know the types and knows the fields as we're generating the test we can even just use Ty{a:0, b:0, } and so on to make the zeroed value, no unsafe needed.

Apostoln commented 3 years ago

@Lokathor If so, in case of Ty{a: 0, b: 0, } how we can know the default value of a and b if they don't implement Default?

Lokathor commented 3 years ago

nearly every struct that bindgen looks at will be fields made of integer or floating or a pointer, or an alias of one of those, so just put 0 or 0.0 or core::ptr::null().

and if that's not the case then pull out the unsafe, but avoid it if you don't need it.

aclysma commented 3 years ago

I just started seeing the same lint mentioned in https://github.com/rust-lang/rust-bindgen/issues/1651#issuecomment-820642478 in CI as I include beta builds with warnings treated as errors. Is there any guidance on the "correct" way to deal with this? I worked around this by allowing the deref_nullptr lint:

#[allow(unknown_lints)]
#[allow(deref_nullptr)]
pub mod bindings;
pub use bindings::*;

allow(unknown_lints) is necessary to avoid warnings in rustc <=1.52. I don't think this is an ideal solution, but it does let CI pass.

I think when 1.53 drops in a few weeks, this may become a hot issue as many more people may suddenly start seeing these warnings.

Lokathor commented 3 years ago

Last I saw, for C structs the layout tests were effectively useless anyway. For C++ they might do something useful, I've never used bindgen for C++ code.

emilio commented 3 years ago

So the approach in #2055 also doesn't work, because for stuff containing bitfields tests fail like:

bindgen_test_layout_rte_eth_vmdq_tx_conf' panicked at 'attempted to zero-initialize type `rte_eth_vmdq_tx_conf`, which is invalid
jyn514 commented 3 years ago

I recently found the https://docs.rs/offset/0.1.3/offset/macro.offset_of.html library, which I think is what you're looking for.

kornelski commented 3 years ago

How about avoiding zeroing in #2055 with MaybeUninit?

Taaitaaiger commented 3 years ago

This might have already been proposed, but if I replace ::std::ptr::null::<...>() with ::std::ptr::null::<...>().offset(0) all warnings go away in my bindings.

Lokathor commented 3 years ago

You have to use wrapping_offset or it's UB.

Bromeon commented 3 years ago

From the longer discussion here, my impression is that the UB is not as straightforward to fix in bindgen, or it's unclear which approach is the best.

But until then, would it be possible to silence the warnings in generated tests? What is effectively happening now is that hundreds of projects using bindgen have to implement workarounds to suppress the warnings, as you can see from all the reverse-links above. And all those patches will become dead code once this issue is solved, possibly staying around for a long time, in the worst case even hiding other legitimate problems.

Lokathor commented 3 years ago

Some of the UB is in fact trivial to fix. If your struct is zeroable, which basically all C structs are, you can just make a zeroed instance, and then forget it at the end if the type isn't Copy. That should fix a very large portion of cases.

Dushistov commented 2 years ago

Why not just use other address instead of 0? For example 80KB. This is runtime error instead of UB as I understand. But because of no access to this address in runtime, so we have no runtime error?

#[repr(C)]
struct timeval {
    tv_sec: u32,
}

fn main() {
    assert_eq!(
        unsafe { &(*(0x14000 as *const timeval)).tv_sec as *const _ as usize },
        0x14000 + 0usize,
        concat!(
            "Offset of field: ",
            stringify!(timeval),
            "::",
            stringify!(tv_sec)
        )
    );
}

references: The last discussion about offsetof that I found: https://internals.rust-lang.org/t/pre-rfc-add-a-new-offset-of-macro-to-core-mem

kornelski commented 2 years ago

@Dushistov There's dangling method for making fake non-null pointers. But making references from them is still UB, and Miri still detects it.

Dushistov commented 2 years ago

@kornelski

But making references from them is still UB, and Miri still detects it.

But isn't this just bug in miri? For example I can call mmap on Linux with MAP_FIXED to map something to fixed address, and than convert usize to &[u8]. Or on embedded system I can have SRAM at address 0x14000.

Why miri mark such perfectly valid programs as UB?

kornelski commented 2 years ago

(We may be getting of topic here, so if you want to discuss it more, we could chat on users.rust-lang.org)

The Rust semantic that Miri is trying to uphold is that there must be a valid, allocated, initialized memory behind any address that you convert to a reference. Pointer provenance is a tricky problem that's not solved yet afaik, so details of the rules aren't settled.

Memory mapping functions should work. If they aren't handled yet, it's probably a matter of adding annotations/special case to Miri to let it know they're like allocators (the same way it needs to know about malloc).

Dushistov commented 2 years ago

We may be getting of topic here Memory mapping functions should work let it know they're like allocators

I don't think that we getting "of topic", because of don't you describe solution of current problem?

Like

#[allocator_tag_for_miri]
fn my_fancy_allocator<T>() -> *const T { 0 as *const T }

 assert_eq!(
        unsafe { &(*my_fancy_allocator::<timeval>()).tv_sec as *const _ as usize },
        0usize,
        concat!(
            "Offset of field: ",
            stringify!(timeval),
            "::",
            stringify!(tv_sec)
        )
    );

This should make compiler happy and convert UB to potential runtime error?

purew commented 2 years ago

Are there any workarounds I can apply to the C-code to avoid triggering these asserts? This issue is becoming a blocker to upgrading my Rust-version above 1.51.0.

Bromeon commented 2 years ago

@PureW You mean actually triggering the asserts in C?

In case this is only about the warning, you could silence it from Rust, as we did in https://github.com/godot-rust/godot-rust/pull/776. In short, it's:

#![allow(deref_nullptr)]

in the module that contains the generated symbols. Keep in mind that this lint will be unknown for older compiler versions.


As mentioned above, I really think bindgen should fix this ASAP, even if it's just by silencing the warning for the time being. It's one thing to have potential warnings, but another to impose workarounds on every user of the library.