rust-lang / rust-bindgen

Automatically generates Rust FFI bindings to C (and some C++) libraries.
https://rust-lang.github.io/rust-bindgen/
BSD 3-Clause "New" or "Revised" License
4.42k stars 694 forks source link

Signedness mismatch #1594

Open npmccallum opened 5 years ago

npmccallum commented 5 years ago

Input C/C++ Header

#define FOO 253

Bindgen Invocation

$ bindgen input.h

Actual Results

/* automatically generated by rust-bindgen */

pub const FOO: u32 = 253;

Expected Results

/* automatically generated by rust-bindgen */

pub const FOO: ::std::os::raw::c_int = 253;

Summary

Bindgen outputs a u32 instead of c_int for defines. This makes them difficult to work with due to signedness mismatch.

npmccallum commented 5 years ago

See https://en.cppreference.com/w/cpp/language/integer_literal

Specifically the section: "The type of the literal"

Bindgen should not attempt to detect signedness. It should rely on the u or U suffix for that.

emilio commented 5 years ago

Well, in practice most people don't really use the u / U suffix, so we try to pick an appropriate size. In any case you can override this: https://docs.rs/bindgen/0.50.0/bindgen/callbacks/trait.ParseCallbacks.html#method.int_macro

npmccallum commented 5 years ago

There are two issues here: size and signedness. Size detection is fine. Signedness is not.

Trying to detect sign is wrong. Half the time you will get it wrong and people will have to fix it up in the rust code. The C specification is very clear what sign is assigned to literals. The only difference is that Rust doesn't have implicit conversion but C does.

Put simply, you're trying to work around a bug in the source file. This is absolutely a bad default. The default should be to follow the C specification. Anything else is unpredictable.

emilio commented 5 years ago

That is fair. I'm not opposed to changing behavior here in theory. The macro evaluator (https://github.com/jethrogb/rust-cexpr) doesn't provide any information other than the final i64 value, so fixing this requires changing that first, aiui.

npmccallum commented 5 years ago

@emilio I've been doing some thinking about this problem. Obviously, cexpr still needs to provide us the information. However, bindgen also discards information. Specifically, it converts an integer literal to a typed integer constant without any information of how it is used. Committing to a type ahead of time results in ergonomic problems later.

What if instead of making C macro definitions into Rust constants we turned them into Rust macros? We obviously trade some syntactic nicety for ergonomics. Code that uses the macros would get integer literals and Rust would automatically promote the type at the site of use.

Thoughts?

emilio commented 5 years ago

It may be ok to do that, but I'm not 100% sure ergonomics are better... Specially since macros in pre-2018 edition (I think?) are subject to module ordering, so if you have:

mod bindings;
mod other; // <- this uses the macros in `bindings`

It'd work, while:

mod other; // <- this uses the macros in `bindings`
mod bindings;

wouldn't.

npmccallum commented 5 years ago

@emilio The ergonomics of pre-2018 Rust are really a low priority to me. So much has happened since 2018 in the language that if you're worried about ergonomics you just update to stable or even nightly.

The bigger concern that I see is that bindgen would break its current API. But it solves this problem:

#define CONSTANT 1234

void foo(uint64_t value);
void bar(int value);

Currently, bindgen doesn't work without casts in either of the above function calls. You end up with the following output:

pub const CONSTANT: u32 = 1234;

extern "C" {
    pub fn foo(value: u64);
}

extern "C" {
    pub fn bar(value: ::std::os::raw::c_int);
}

The problem is, of course, that CONSTANT can't be passed to either function without a cast. This is because bindgen chooses the type for CONSTANT ahead of time.

My proposal is to make bindgen emit this instead:

#[macro_export]
macro_rules! CONSTANT {
    () => { 1234 }
}

extern "C" {
    pub fn foo(value: u64);
}

extern "C" {
    pub fn bar(value: ::std::os::raw::c_int);
}

In this case, both of these calls works as expected:

foo(CONSTANT!());
bar(CONSTANT!());
jethrogb commented 5 years ago

I like the idea in general, but I think it's worthwhile keeping the existing behavior around for libraries with a sane API. Maybe we can make this configurable using the ParseCallbacks?

npmccallum commented 5 years ago

@jethrogb Configurable seems sane to me.

moosterhof commented 2 years ago

Slightly related, size detection also fails: https://github.com/rust-lang/rust-bindgen/issues/923

ojeda commented 1 year ago

The macro idea is discussed at https://github.com/rust-lang/rust-bindgen/issues/1594, perhaps the discussion around that should be moved there.

However, bindgen also discards information. Specifically, it converts an integer literal to a typed integer constant without any information of how it is used.

The bigger concern that I see is that bindgen would break its current API. But it solves this problem:

#define CONSTANT 1234

void foo(uint64_t value);
void bar(int value);

Currently, bindgen doesn't work without casts in either of the above function calls.

Please see my reply at https://github.com/rust-lang/rust-bindgen/issues/2120#issuecomment-1739067906 -- the discussion is very similar.