rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.96k stars 1.57k forks source link

Policy for assumptions about the size of `usize` #1748

Open durka opened 8 years ago

durka commented 8 years ago

When in the course of human rusty events, something in core or std depends on the actual width of usize/isize, there are currently (at least) two policies in place:

  1. Conservatively assume that usize may be as narrow as 8 bits.
    • example: usize: From<u8> + !From<u16>
  2. Liberally assume that usize is at least 32 bits wide (as it is on all current officially supported platforms).
    • example: Range<u32>: ExactSizeIterator

Let me know if I missed any other corners of the standard library which make assumptions (identical to one of these or not).

As these policies are in conflict, it seems like one or both of them should be changed. In principle, we can't remove trait implementations from Range<u32> and the like, so we could just declare target_pointer_width-liberalism to be the law of the land. However, this will make it difficult to port Rust to a 16-bit system. In doing such porting, trait implementations like From<u32> for usize and ExactSizeIterator for Range<u32> would need to be gated by a #[cfg]. But, this would make it difficult to port Rust code from, say, a 32-bit target to a 16-bit target, because some code would stop compiling (N.B. this is already potentially the case, because literals given for enum variants are interpreted as isize literals).

So, what should we do?

briansmith commented 8 years ago

Let's see if we can narrow the bounds just a little.

durka commented 8 years ago

Makes good sense to me. Those proposals still leave the question of what to do about impl ExactSizeIterator for Range<i32>. Options are:

petrochenkov commented 8 years ago

So, what should we do?

Gate impls on target_pointer_width for all currently supported values of target_pointer_width. When a target with new value of target_pointer_width is added (16 bit, 128 bit, 8 bit, whatever), then new set of cfgs is added as well.

But, this would make it difficult to port Rust code from, say, a 32-bit target to a 16-bit target, because some code would stop compiling

It would make porting simpler because incorrect range assumptions and overflows will be caught at compile time.

durka commented 8 years ago

Caught at compile time when you're porting. If we put in #[cfg(target_pointer_width = "64")] impl ExactSizeIterator for Range<u64> {} then people will be confused when they release a crate, someone downloads it on a 32-bit machine, and Iterator::rposition randomly stops working.

petrochenkov commented 8 years ago

@durka This is a real problem, 32/64 bits are equally common and often ported between, unlike 16-bit used by very specialized hardware now. @aturon (IIRC) suggested to add a special lint to avoid these 32-bit <-> 64-bit portability problems.

Impls like From<u64> for usize still need to conditionally exist because a lot of software is supposed to run, for example, on very specific 64-bit server hardware under some enterprise Linux and not going to be ported anywhere.

durka commented 8 years ago

I like the idea of having a lint if an impl is selected that's tagged with #[cfg(target_pointer_width)] (or other target attributes maybe).

oyvindln commented 8 years ago

I propose that we at least assume that usize/isize are no smaller than u16/i16. Note that this is true, in particular, for 8-bit AVR (Arduino). This implies that we should impl From for usize and impl From for isize.

I don't know about wider types, but From<u16> for usize sounds reasonable. C99 and newer recommends the closest equivalent (size_t) to be at least 16-bits.C99 Standard (see page 259). I would think a system where usize would be less than 16 bits (as @briansmith noted, a processor being 8-bit doesn't imply usize being that small) would require rather specialised code anyhow.

comex commented 8 years ago

Maybe a set of special purpose lints?

#[allow(assume_usize_ge_32_bits)]
#[allow(assume_usize_le_64_bits)]

The standard library really should provide some way to safely cast under such assumptions, whether From or something else. If it doesn't, most people won't avoid making them; they'll just hide them in as casts, which are evil.

withoutboats commented 8 years ago

I propose that we at least assume that usize/isize are no larger than u64/i64. This implies that we should impl From for u64 and impl From for i64.

Are we actually confident this is a reasonable assumption over the next 50 years? I guess if it becomes untrue we can make a breaking change.

aturon commented 8 years ago

Nominated for lang team discussion.

nikomatsakis commented 8 years ago

I wrote up the @rust-lang/lang team discussion in this internals thread.

petrochenkov commented 7 years ago

cc https://github.com/rust-lang/rfcs/pull/1868

SimonSapin commented 7 years ago

CC https://github.com/rust-lang/rust/pull/43086#issuecomment-313872797

SimonSapin commented 7 years ago
  1. Conservatively assume that usize may be as narrow as 8 bits.

https://en.wikibooks.org/wiki/C_Programming/stdint.h#Integers_wide_enough_to_hold_pointers claims that uintptr_t is at least 16 bits.

eternaleye commented 7 years ago

@SimonSapin: I checked the C standards, because the linked page cites the manpage, which might have been overconstrained (both C and POSIX apply constraints to some types and constants).

So yes, C's uintptr_t is at least 16 bits, as is its intptr_t. (Though it is legal for it to be unable to represent -2¹⁵, this is presumably as a concession to one's-complement machines, which I don't think Rust supports anyway.)

SimonSapin commented 6 years ago

PR https://github.com/rust-lang/rust/pull/49305 includes:

scottjmaddox commented 5 years ago

Perhaps all From andTryFrom impl's could be conditionally compiled with #[cfg(target_pointer_width=*)], and then some mechanism could be added to cargo check that verifies type checking for the desired supported pointer widths, as configured in Cargo.toml (and defaulting to 16, 32, and 64 bit)?

Making this work (or at least work efficiently) might require an extension to rustc, in order to override the target pointer width during a check pass.

briansmith commented 5 years ago

A possible way forward:

Define some new submodules, e.g. std::arch::at_least_32_bits, std::arch::at_most_64_bits. These modules would define the implementations of the u32 -> usize and usize <- u64 conversions. A program that needs these conversions must explicitly import those modules to get them. Those modules aren't available when the target platform doesn't meet the requirements for them. When compiling a crate that makes assumptions about conversions to/from usize, on a target for which those assumptions are invalid, the build will fail pointing directly to the use std::arch::at_least_32_bits; or use std::arch::at_most_64_bits; (or whatever) statements, which will make it obvious what the problem is.

No new language features would be required.

durka commented 5 years ago

Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go.

briansmith commented 5 years ago

Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go.

Keep in mind that those modules wouldn't exist for targets that don't meet the limits.

briansmith commented 5 years ago

Oh, I see, you're saying that the conversions would still be possible even if the program didn't have the use statements. That's right. :(

durka commented 5 years ago

But when they do there's no way to enforce the requirement to import them. The impls are visible regardless. I can't think of a way to do this with imports, but maybe there is some hack with generics and specialization or something.

On Thu, Dec 13, 2018 at 12:11 AM Brian Smith notifications@github.com wrote:

Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go.

Keep in mind that those modules wouldn't exist for targets that don't meet the limits.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rfcs/issues/1748#issuecomment-446844712, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC3n463tqs3icNe7eB3OyT_HZLI1z5yks5u4eFrgaJpZM4J6_rn .

briansmith commented 5 years ago

I see that libc::size_t is defined as type size_t = usize; which allows implicit conversions between size_t and usize, which is an even bigger hazard than explicit conversions between usize and size_t. it's been argued that usize is defined to be equivalent to uintptr_t and not necessarily equivalent to size_t. I think we should have impl From<libc::size_t> for usize and impl From<usize> for libc::uintptr_t at least. However, I think we also need at least impl From<usize> for libc::size_t which, in the case where usize is larger than size_t, somehow knows how to truncate a usize that actually represents a size (vs one that represents a pointer) to a size_t losslessly.

Also note that there are attempts to define a "maximum object size" and so far many people have suggested that isize::max_value() or usize::max_value() are appropriate limits there. That would usually be incorrect in the case where uintptr_t is larger than size_t. Probably such limits need to be defined relative to ssize_t and size_t.

SimonSapin commented 5 years ago

type size_t = usize; which allows implicit conversions between size_t and usize

There is no conversion here, even implicit. A type item gives another name to a type. The two names refer to the same type. As far as I know there is no difference with a pub use reexport.

briansmith commented 5 years ago

There is no conversion here, even implicit. A type item gives another name to a type. The two names refer to the same type. As far as I know there is no difference with a pub use reexport.

You and I are saying the same thing in different ways. The point is that this works for most, but not all, platforms:

fn foo(n: usize) -> libc::size_t { n }

In https://github.com/rust-lang/unsafe-code-guidelines/issues/99 at least one person claimed that that code isn't guaranteed to work for all targets because sometimes size_t will not be an alias for usize. That we can use usize interchangeably with libc::size_t on some platforms but not every platforms is in conflict with the trend of the discussion in this issue above, where we don't even allow explicit conversions Into/From usize unless the conversion would work on every platform. It doesn't seem right that we are rejecting some explicit conversions to/from usize while refusing to provide similar explicit conversions. We should find some way to resolve that inconsistency. My preferred way of removing the consistency is to drop the requirement that usize is the same as uintptr_t and instead require usize is the same as size_t, which is a breaking change that's unlikely to happen. A more realistic change would be to replace type size_t = usize; with #[repr(transparent)] struct size_t(usize); in a new major version of libc.

SimonSapin commented 5 years ago

sometimes size_t will not be an alias for usize

I agree that this is incompatible with the way the libc crate is currently defined.

(This is somewhat besides the point, but what are some platforms where size_t is not uintptr_t?)

briansmith commented 5 years ago

(This is somewhat besides the point, but what are some platforms where size_t is not uintptr_t?)

A 64-bit CHERI-based platform will have 256-bit or 128-bit pointers and 64-bit usize. Pointers are a composite of security information and the address. Similarly, any ABI that requires pointers to be represented as (&[T], size_t i) or equivalent would have uintptr_t different than usize.

(Also potentially the ordering of uintptr_t and usize is different for the same bit pattern even when they are the same size, because some new security technologies put authentication information in the high bits of pointers.)

I am particularly interested in Rust supporting these security-oriented ABIs in the future as they become practical.

gnzlbg commented 5 years ago

@briansmith

Note that we can only control the maximum allowed size of Rust objects (repr(Rust)). The maximum allowed size of C objects, which repr(C) types have to respect, is fixed by the C platform, and is outside our control.

That would usually be incorrect in the case where uintptr_t is larger than size_t.

AFAICT this would only mean that the maximum allowed size of repr(Rust) values can be greater or equal to the maximum allowed size of repr(C) values, which is perfectly fine. So what do you mean by "incorrect" ?

briansmith commented 5 years ago

So what do you mean by "incorrect" ?

Sure, in theory you could define the maximum object size to be 2**256 - 1 bytes if you want (if uintptr_t is 256 bits). But I doubt anybody wants that.

gnzlbg commented 5 years ago

Sure, in theory you could define the maximum object size to be 2**256 - 1 bytes if you want (if uintptr_t is 256 bits). But I doubt anybody wants that.

The exact same can be argued of 2**64 - 1, right? AFAICT these limits only matter if they are small enough for normal Rust code to run into them (e.g. on 8, 16, 32 bit platforms). Once the limits become high enough (e.g. 48-bit or larger), do they still matter ? For example, there is unsafe code in std that ensures that these limits aren't reached on 32-bit platforms, but for 64-bit targets it is essentially dead-code that will never be reached in practice (EDIT: not only essentially, libstd just assumes it does not happen: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L735).

daira commented 3 years ago

I propose that Rust code that is targetting std (i.e. does not use #![no_std]) should be able to assume that usize is at least 32 bits.