rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.15k stars 12.69k forks source link

Lint against treating `c_char` as `i8` or `u8`, for portability #79089

Open joshtriplett opened 3 years ago

joshtriplett commented 3 years ago

One common source of portability issues is making assumptions about whether c_char is i8 or u8. Code written for one platform may fail to build on another due because of this.

I think we could detect this fairly easily: we could add an attribute that means "yes, this is a type alias, but treat it as a slightly distinct type, and lint when matching it against the aliased type if not going through this type alias to do so".

This would help people avoid gratuitous portability issues.

joshtriplett commented 3 years ago

I've tagged this T-lang (because it's a proposed new lint and new attribute), and I've also tagged it for the three common architectures where char differs from the normal signedness.

jyn514 commented 3 years ago

Another lint I'd like: treating i32 or i64 as c_long.

the8472 commented 3 years ago

Should such a lint still fire if the code is cfg'd to a specific platform?

est31 commented 3 years ago

cc #41619

jyn514 commented 3 years ago

Should such a lint still fire if the code is cfg'd to a specific platform?

I would not expect it to, no.

joshtriplett commented 3 years ago

As a first pass, it would be acceptable if the lint triggered even in that case. It could be allow by default, for instance.

In other words I don't think it should be gated on the portability lint, though once the portability lint is available this should integrate with it.

jacobbramley commented 3 years ago

Note, as well, that this isn't clear in the libc documentation (which seems to be generated from the x86_64 build).

est31 commented 3 years ago

@jacobbramley in the standard library this issue has been improved recently with PR #89114 . The libc crate's repository is here: https://github.com/rust-lang/libc

workingjubilee commented 3 years ago

A motivating example for this problem:

use std::os::raw::c_char;
fn main() {
    let x: u8 = 0b10000000; // set the last bit. 128 in decimal
    let x = x as i8 as c_char;
    let x = x >> 1; // a non-portable operation
}

Neither Clippy nor the compiler lint on this right shift, but if c_char is signed, then this is an arithmetic shift, but if c_char is unsigned, then this is a logical shift. This is a severe portability hazard, as x now either has the bit pattern 0b11000000 or 0b01000000.

Kmeakin commented 3 years ago

The issue is deeper than we first thought. The fundamental problem is that the Rust type system does not distinguish between a type alias and its aliased type: type-alises are eagerly resolved during type-checking. This means that a lint which only has access to the HIR and the result of type-checking cannot easily detect when an i8/u8 is being used as a c_char (I will use c_char throughout this post, but the same issue applies to types like c_long whose sign and/or width changes between platforms).

We have found 3 possible solutions, none of which is fully satisfactory:

  1. Write a lint to detect uses of improper uses of c_char. This would necessarily be incomplete: for example, 0_u8 as c_char is trivial to detect, as c_char is explicitly mentioned in the cast. However, in general detecting treatment of a u8/i8 as a c_char requires arbitrarily complex type inference:
fn require_types_equal<T>(x: T, y: T) {}

fn main() {
    let x: c_char = 0;
    let y = 0_u8 as _;
    require_types_equal(x, y);
}

Here inferring the target type of y's cast requires information from both the line before and after its definition. As we extend the lint to detect more and more cases, we end up effectively replicating Rust's type inference algorithm (inevitably not 100% faithfully).

  1. Change how type inference works to make these kind of uses easier to detect in lints - perhaps add an attribute for type-aliases that should be stronger than an aliases but less strong than a distinct struct. This would be the most general solution, and could provide utility to other projects in the future, but also requires modifying the type-checker, one of the most complex parts of the compiler.

  2. Deprecate the c_char type-alias and instead create a repr(transparent) struct:

#[cfg(target_arch = "x86_64")]
pub type CCharRepr = i8;

#[cfg(target_arch = "aarch64")]
pub type CCharRepr = u8;

#[repr(transparent)]
pub struct CChar {
    repr: CCharRepr
}

impl CChar {
    pub fn new(repr: CCharRepr) -> Self {
        Self { repr }
    }

    pub unsafe fn get_inner(&self) -> CCharRepr {
        self.repr
    }
}

The intention here is that CChar is an opaque type with no operations defined on it: it is only good for passing to extern "C" functions. There is still a getter to access the internal representation, but it is unsafe to emphasize that any operations on the internal representation may be platform-specific (for example, bitshifts behave differently depending on sign).

This has the advantage of only requiring changes to the standard library, not the compiler. Deprecating c_char is not by itself backwards-incompatible: c_char would still be available for existing code that has not yet migrated. However, c_char is also exposed used the public APIs of parts of the standard library, for example std::ffi::CString::into_raw returns a *mut c_char, and std::ffi::CStr::as_ptr returns a *const c_char. It would be unsatisfactory for these functions to return a deprecated type, but changing their return types would be a breaking change and require a new edition.