Closed Lokathor closed 2 years ago
This is known, but thanks for filing. I've always hoped that rust would have an offsetof macro, but... ;)
Creating structs on the stack with zeroed memory is ok for C, but not so sure for C++, where what bindgen may generate is not POD, or may have destructors or other side effects.
Can go by if the type is Copy or not. Also can use ManuallyDrop if the Rust target is new enough.
I'm not sure what most of the point of those tests is anyway, most of them can't ever fail so it seems kinda silly.
How not? Those tests are important, if bindgen generates something that has the wrong size or alignment or anything, those tests catch it.
Replied in a separate issue because it's not really related to this "offset calculation is UB" issue.
@Lokathor whatever you come up with will be UB anyways, because rust-bindgen
needs to compute the offset of #[repr(packed)]
struct fields, and there is no way to do that in Rust without UB.
@gnzlbg that is true, but it has nothing to do with packed
. You can't do it for non-packed struct either, and once we have fixed that (using &raw
) the fix will work for packed structs, too.
https://github.com/Gilnaa/memoffset/ implements the currently best way we have to do this. The most important part is to avoid NULL references; you could use NonNull::dangling
instead.
I've always hoped that rust would have an offsetof macro, but... ;)
Given that library solutions cannot (it seems) soundly support nested fields and arrays, I think indeed this should become a built-in macro. But I don't have a thick enough skin for the syntax bikeshed that will undoubtedly trigger. ;)
Until then, we have that macro as a library.
You can't do it for non-packed struct either, and once we have fixed that (using &raw) the fix will work for packed structs, too.
Notice that rust-bindgen
derives Default
for its types, so at least when it does that it could actually construct a value of a type by using Default::default
and then compute the field offsets on that value without any null dereferences. That computation still requires creating a reference to a field, and since those fields can be unaligned for #[repr(packed)]
structs, that would still be UB.
@gnzlbg why did i get pinged?
Because you proposed a solution:
There is actually trivial to fix. Instead of using a null pointer, just create a value on the stack (using core::mem::zeroed(), which is safe since all C structs can be created as zeroed-memory), and then use a reference to that stack value, the fields of that stack value, and so on.
so I thought you might work on it, but that solution doesn't work, because of repr(packed)
.
I think the mem::zeroed
part happens to work because rust-bindgen
never generates types (at least by default, or at least it shouldn't) for which it doesn't work, but that's not the only source of UB.
Oh, well (1) I'm not going to work on this myself I'm just reporting the problem, (2) bindgen knows when it will emit repr(packed)
so just don't do the simplistic version in that case (3) right the main problem is dereferencing a null pointer.
(2) bindgen knows when it will emit repr(packed) so just don't do the simplistic version in that case
What should it do in that case then?
(3) right the main problem is dereferencing a null pointer.
The main problem is undefined behavior, but there are no solutions without undefined behavior.
There is nothing for rust-bindgen
to do here: whatever it does, it will be UB, and someone will open an issue to report that, and maybe change that, into something else that also has UB.
This should either be "blocked on" some rust-lang/rust tracking issue, or closed as "non-actionable", but right now there is no way to fix on rust-bindgen
's side.
Structs that aren't repr(packed) can be fixed.
Further, structs that are repr(packed) can still at least be tested against a zeroed value on the stack instead of on a null pointer and at least eliminate one of the problems, even if it's not every problem being solved.
It's okay to solve part of a problem at a time.
I don't see the point yet.
If this is a problem because it is UB, there are currently not solutions to this problem. But if "UB is ok as long as it works", then the current code, which works, is as good as anything else. Time would be better spent into trying to provide a way to do this without any UB, but that cannot happen here.
Well, most structs aren't repr(packed), so actually it can happen here.
Well, most structs aren't repr(packed), so actually it can happen here.
Even for non repr(packed)
structs, this would still be UB if the type has zero as a niche, which I don't think happens by default (except maybe for C++), but happens often when using the opaque-type feature, for example.
...okay but you get my point that this can be fixed for the vast majority of C struct bindings I hope.
I disagree with vast majority (all C bindings I work on use repr(packed)
), but yes, for some subset of C bindings, some of rust-bindgen tests could become UB free.
@gnzlbg there's still a big difference between "theoretical UB" (that works because it's spec-only UB but we currently -- as of the latest nightly rustc -- never exploit it) and UB that rustc/LLVM might actually exploit, but they happen to not do that in the cases you checked (like NULL references).
I don't understand why you are fighting back so hard against what is clearly a strict improvement in language compliance. Why this black-and-white attitude?
This issue reports that the generic code that bindgen uses is broken, but such code can’t be fixed, and this issue can’t be closed. The code has currently UB in the spec and in LLVM, and the proposed “fix” does so as well. We all agree that this reliance on UB is brittle, and Id rather not change something that’s known to work for bindgen users to something that doesn’t really fixes the issue and isn’t known to work at the bindgen scale (many more crates use bindgen than offsetof, reliance on the behavior of UB on many different toolchain versions, etc.). This also only impacts UB in
as dependencies.
Once &raw is available, fixing this issue would be a no brainer, but right now there are IMO dozens of other open issues with a better value proposition than what putting any work on this one would deliver.
So, let me rephrase: "if it can't be perfect (doesn't really fix the issue) then we don't care how awful it is".
many more crates use bindgen than offsetof
offsetof is used in crossbeam, so if you really want to compare numbers... memoffset recent downloads: 1,592,597 bindgen recent downloads: 310,642
But also, this is clearly not leading anywhere, so I'll just stop here.
"if it can't be perfect (doesn't really fix the issue) then we don't care how awful it is".
That's not what I said.
Update: there is a fully sound offset_of!
macro available in the bytemuck
crate:
https://docs.rs/bytemuck/1.2.0/bytemuck/macro.offset_of.html
The "catch" is that you have to pass in an instance of the type to the macro. So either the type needs to have default or makeable with all 0s or something like that.
Would the addition of addr_of make this easier to implement without UB?
FYI, the recent addition of a new lint for deref'ing a null ptr surfaces this:
error: dereferencing a null pointer
--> ../../src/lib/usb_bulk/rust/src/usb.rs:59:19
|
59 | unsafe { &(*(::std::ptr::null::<usb_ifc_info>())).dev_vendor as *const _ as usize },
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this code causes undefined behavior when executed
|
= note: `-D deref-nullptr` implied by `-D warnings`
Perhaps? But for that we'd also need to create an instance of the type, which we might not know how to create, and which may have other side effects... Can we get an address into a MaybeUninit::<NonTrivial>::uninit()
without it being UB somehow?
I don't see how off-hand but that doesn't necessarily mean it doesn't exist... In any case I think we'd still be working around the lack of a standard offset_of!
macro...
any C type can be constructed of zeroed memory at least.
Not if you implement Drop
on the rust side or what not, which might be a reasonable thing to do.
well, okay, then forget the value at the end.
memoffset::offset_of!
does this with uninitialized()
:
macro_rules! offset_of {
($father:ty, $($field:tt)+) => ({
let root: $father = unsafe { $crate::mem::uninitialized() };
let base = &root as *const _ as usize;
let member = &root.$($field)* as *const _ as usize;
$crate::mem::forget(root);
member - base
});
}
https://docs.rs/memoffset/0.2.0/src/memoffset/offset_of.rs.html#68-79
Please ignore my previous comment. Dereferencing of unitialized memory is also an UB, of course. The worst is this kind of UB is unsound in terms of warnings.
I suggest another way. The example is a bit drafty but it works. There no dereferencing of any kind of bad pointers, there no problems with unaligned references (see docs of std::ptr::addr_of), and of course there no warnings.
There is only one potential UB - if the type Foo
or any of its fields can't have a correct zero-bit-pattern, like NonZero or NonNull. However, we ase using bindgen only for generation the Rust structs from C structs, therefore it's impossible.
use std::mem;
use std::ptr;
#[repr(packed)]
#[derive(Debug, Copy, Clone)]
struct Foo {
boolean: bool,
num: u32,
}
fn main() {
const SIZE: usize = mem::size_of::<Foo>();
let buffer = [0u8; SIZE];
let foo = unsafe {
std::mem::transmute::<[u8; SIZE], Foo>(buffer)
};
let foo_ptr = &foo as *const Foo;
let num_ptr = unsafe {
ptr::addr_of!((*(foo_ptr)).num)
};
let offset = num_ptr as usize - foo_ptr as usize;
dbg!(offset);
}
Yeah, for types that allow an all zeroed value, we can construct the value from zeroed memory. since we know the types and knows the fields as we're generating the test we can even just use Ty{a:0, b:0, } and so on to make the zeroed value, no unsafe needed.
@Lokathor If so, in case of Ty{a: 0, b: 0, }
how we can know the default value of a
and b
if they don't implement Default
?
nearly every struct that bindgen looks at will be fields made of integer or floating or a pointer, or an alias of one of those, so just put 0 or 0.0 or core::ptr::null().
and if that's not the case then pull out the unsafe, but avoid it if you don't need it.
I just started seeing the same lint mentioned in https://github.com/rust-lang/rust-bindgen/issues/1651#issuecomment-820642478 in CI as I include beta builds with warnings treated as errors. Is there any guidance on the "correct" way to deal with this? I worked around this by allowing the deref_nullptr lint:
#[allow(unknown_lints)]
#[allow(deref_nullptr)]
pub mod bindings;
pub use bindings::*;
allow(unknown_lints) is necessary to avoid warnings in rustc <=1.52. I don't think this is an ideal solution, but it does let CI pass.
I think when 1.53 drops in a few weeks, this may become a hot issue as many more people may suddenly start seeing these warnings.
Last I saw, for C structs the layout tests were effectively useless anyway. For C++ they might do something useful, I've never used bindgen for C++ code.
So the approach in #2055 also doesn't work, because for stuff containing bitfields tests fail like:
bindgen_test_layout_rte_eth_vmdq_tx_conf' panicked at 'attempted to zero-initialize type `rte_eth_vmdq_tx_conf`, which is invalid
I recently found the https://docs.rs/offset/0.1.3/offset/macro.offset_of.html library, which I think is what you're looking for.
How about avoiding zeroing in #2055 with MaybeUninit
?
This might have already been proposed, but if I replace ::std::ptr::null::<...>()
with ::std::ptr::null::<...>().offset(0)
all warnings go away in my bindings.
You have to use wrapping_offset
or it's UB.
From the longer discussion here, my impression is that the UB is not as straightforward to fix in bindgen
, or it's unclear which approach is the best.
But until then, would it be possible to silence the warnings in generated tests? What is effectively happening now is that hundreds of projects using bindgen
have to implement workarounds to suppress the warnings, as you can see from all the reverse-links above. And all those patches will become dead code once this issue is solved, possibly staying around for a long time, in the worst case even hiding other legitimate problems.
Some of the UB is in fact trivial to fix. If your struct is zeroable, which basically all C structs are, you can just make a zeroed
instance, and then forget
it at the end if the type isn't Copy. That should fix a very large portion of cases.
Why not just use other address instead of 0? For example 80KB. This is runtime error instead of UB as I understand. But because of no access to this address in runtime, so we have no runtime error?
#[repr(C)]
struct timeval {
tv_sec: u32,
}
fn main() {
assert_eq!(
unsafe { &(*(0x14000 as *const timeval)).tv_sec as *const _ as usize },
0x14000 + 0usize,
concat!(
"Offset of field: ",
stringify!(timeval),
"::",
stringify!(tv_sec)
)
);
}
references:
The last discussion about offsetof
that I found:
https://internals.rust-lang.org/t/pre-rfc-add-a-new-offset-of-macro-to-core-mem
@Dushistov There's dangling method for making fake non-null pointers. But making references from them is still UB, and Miri still detects it.
@kornelski
But making references from them is still UB, and Miri still detects it.
But isn't this just bug in miri?
For example I can call mmap
on Linux with MAP_FIXED
to map something to fixed address,
and than convert usize
to &[u8]
. Or on embedded system I can have SRAM at address 0x14000
.
Why miri mark such perfectly valid programs as UB?
(We may be getting of topic here, so if you want to discuss it more, we could chat on users.rust-lang.org)
The Rust semantic that Miri is trying to uphold is that there must be a valid, allocated, initialized memory behind any address that you convert to a reference. Pointer provenance is a tricky problem that's not solved yet afaik, so details of the rules aren't settled.
Memory mapping functions should work. If they aren't handled yet, it's probably a matter of adding annotations/special case to Miri to let it know they're like allocators (the same way it needs to know about malloc).
We may be getting of topic here Memory mapping functions should work let it know they're like allocators
I don't think that we getting "of topic", because of don't you describe solution of current problem?
Like
#[allocator_tag_for_miri]
fn my_fancy_allocator<T>() -> *const T { 0 as *const T }
assert_eq!(
unsafe { &(*my_fancy_allocator::<timeval>()).tv_sec as *const _ as usize },
0usize,
concat!(
"Offset of field: ",
stringify!(timeval),
"::",
stringify!(tv_sec)
)
);
This should make compiler happy and convert UB to potential runtime error?
Are there any workarounds I can apply to the C-code to avoid triggering these asserts? This issue is becoming a blocker to upgrading my Rust-version above 1.51.0
.
@PureW You mean actually triggering the asserts in C?
In case this is only about the warning, you could silence it from Rust, as we did in https://github.com/godot-rust/godot-rust/pull/776. In short, it's:
#![allow(deref_nullptr)]
in the module that contains the generated symbols. Keep in mind that this lint will be unknown for older compiler versions.
As mentioned above, I really think bindgen should fix this ASAP, even if it's just by silencing the warning for the time being. It's one thing to have potential warnings, but another to impose workarounds on every user of the library.
Test code for struct offsets run checks like this:
But it's UB to dereference a null pointer, and it's also UB to create a
&T
with numeric value 0.There is actually trivial to fix. Instead of using a null pointer, just create a value on the stack (using
core::mem::zeroed()
, which is safe since all C structs can be created as zeroed-memory), and then use a reference to that stack value, the fields of that stack value, and so on.