immunant / c2rust

Migrate C code to Rust
https://c2rust.com/
Other
3.91k stars 229 forks source link

Improve specific literal transpilation #152

Open crlf0710 opened 5 years ago

crlf0710 commented 5 years ago

This code:

  int result = 0;
    if (result > 0x7fff)
      result -= 0x10000L;

got transpiled into this rust code on x64 linux:

    let mut result: libc::c_int = 0i32;
        if result > 0x7fffi32 {
            result = (result as libc::c_long - 0x10000i64) as libc::c_int
        }

While this is technical correct. The generated code is less portable than the c version. In fact, on x64 windows, this code will fail to compile with error:

     |
1201 |             result = (result as libc::c_long - 0x10000i64) as libc::c_int
     |                                              ^ no implementation for `i32 - i64`
     |
     = help: the trait `std::ops::Sub<i64>` is not implemented for `i32`

I think this can actually be improved somehow. specifically 0x10000L should be of type c_long instead of i64.

rinon commented 5 years ago

We do not (yet) support translating then compiling on a different platform. I agree that the integer literal should ideally be sized to c_long, however, we would have to rely on the rust compilers integer typing rules for that, which doesn't work the same as in C. What we get from clang is actually just that the integer literal has the type 64-bit signed integer, so we translate accordingly, since we don't know anything else at that point in the compiler.

Long term plan is to replace all conditional defines and platform-specific types with either libc crate types or cfg attributes in Rust, but that's not implemented yet.

crlf0710 commented 5 years ago

@rinon Hi, thanks for the reply. I think there's two separate concerns, One is the literal itself, the other is the sub operation. For the first part, I don't know clang enough, and it makes sense that clang convert the integer literals to platform neutral fixed width integers. For the second part, in the example above, the result variable is "promoted" to c_long for consistency with the literal to execute the sub operation. So it seems to me either clang or c2rust actually knows the sub operation should be executed with the type c_long. So maybe it still makes sense to generate code like result = (result as libc::c_long - 0x10000i64 as libc::c_long) as libc::c_int

TheDan64 commented 5 years ago

What about just dropping int literal suffixes and let rust type infer?

result = (result as libc::c_long - 0x10000) as libc::c_int

That should improve both portability and readability

crlf0710 commented 5 years ago

@TheDan64 this is actually exactly what i'm doing when remediating the generated code to make it cross-platform. I'm less sure about whether that will cause compilation issues in special cases.

rinon commented 5 years ago

In most cases we can drop the suffix. However there are occasionally expressions that are ambiguous and require the suffixes to match the semantics of C (Rust defaults to i32 everywhere, C makes it the smallest type that will fit from int up to long int for decimals and up to unsigned long int for other base literals). We might actually be able to get away with allowing unsuffixed literals for everything that will fit in i32, but I'll have to confirm that.

ahomescu commented 4 years ago

Leaving this here for future reference (from https://doc.rust-lang.org/reference/tokens.html#integer-literals):

The type of an unsuffixed integer literal is determined by type inference:

If an integer type can be uniquely determined from the surrounding program context, the unsuffixed integer literal has that type.

If the program context under-constrains the type, it defaults to the signed 32-bit integer i32.

If the program context over-constrains the type, it is considered a static type error.

Edit: we were talking offline about possible ways to fix this, and this list came up.