Open durka opened 8 years ago
Let's see if we can narrow the bounds just a little.
usize
/isize
are no larger than u64
/i64
. This implies that we should impl From<usize> for u64
and impl From<isize> for i64
.usize
/isize
are no smaller than u16
/i16
. Note that this is true, in particular, for 8-bit AVR (Arduino). This implies that we should impl From<u16> for usize
and impl From<i16> for isize
.as
for integer conversions, but instead must use only From
, Into
, TryFrom
, and TryInto
, etc. for such conversions. The achievement of this goal can then guide the rest of the decision making process.Makes good sense to me. Those proposals still leave the question of what to do about impl ExactSizeIterator for Range<i32>
. Options are:
#[cfg(target_pointer_width >= 32)]
(pretend that syntax works)(0..u32::max_value()).len()
to panic on 16-bit systemsSo, what should we do?
Gate impls on target_pointer_width
for all currently supported values of target_pointer_width
.
When a target with new value of target_pointer_width
is added (16 bit, 128 bit, 8 bit, whatever), then new set of cfg
s is added as well.
But, this would make it difficult to port Rust code from, say, a 32-bit target to a 16-bit target, because some code would stop compiling
It would make porting simpler because incorrect range assumptions and overflows will be caught at compile time.
Caught at compile time when you're porting. If we put in #[cfg(target_pointer_width = "64")] impl ExactSizeIterator for Range<u64> {}
then people will be confused when they release a crate, someone downloads it on a 32-bit machine, and Iterator::rposition
randomly stops working.
@durka This is a real problem, 32/64 bits are equally common and often ported between, unlike 16-bit used by very specialized hardware now. @aturon (IIRC) suggested to add a special lint to avoid these 32-bit <-> 64-bit portability problems.
Impls like From<u64> for usize
still need to conditionally exist because a lot of software is supposed to run, for example, on very specific 64-bit server hardware under some enterprise Linux and not going to be ported anywhere.
I like the idea of having a lint if an impl is selected that's tagged with #[cfg(target_pointer_width)]
(or other target attributes maybe).
I propose that we at least assume that usize/isize are no smaller than u16/i16. Note that this is true, in particular, for 8-bit AVR (Arduino). This implies that we should impl From
for usize and impl From for isize.
I don't know about wider types, but From<u16>
for usize sounds reasonable. C99 and newer recommends the closest equivalent (size_t
) to be at least 16-bits.C99 Standard (see page 259). I would think a system where usize would be less than 16 bits (as @briansmith noted, a processor being 8-bit doesn't imply usize being that small) would require rather specialised code anyhow.
Maybe a set of special purpose lints?
#[allow(assume_usize_ge_32_bits)]
#[allow(assume_usize_le_64_bits)]
The standard library really should provide some way to safely cast under such assumptions, whether From or something else. If it doesn't, most people won't avoid making them; they'll just hide them in as
casts, which are evil.
I propose that we at least assume that usize/isize are no larger than u64/i64. This implies that we should impl From
for u64 and impl From for i64.
Are we actually confident this is a reasonable assumption over the next 50 years? I guess if it becomes untrue we can make a breaking change.
Nominated for lang team discussion.
I wrote up the @rust-lang/lang team discussion in this internals thread.
- Conservatively assume that usize may be as narrow as 8 bits.
https://en.wikibooks.org/wiki/C_Programming/stdint.h#Integers_wide_enough_to_hold_pointers claims that uintptr_t
is at least 16 bits.
@SimonSapin: I checked the C standards, because the linked page cites the manpage, which might have been overconstrained (both C and POSIX apply constraints to some types and constants).
intptr_t
entirelyINTPTR_MIN -(2¹⁵ - 1)
INTPTR_MAX 2¹⁵ - 1
UINTPTR_MAX 2¹⁶
INTPTR_MIN -(2¹⁵ - 1)
INTPTR_MAX 2¹⁵ - 1
UINTPTR_MAX 2¹⁶
So yes, C's uintptr_t
is at least 16 bits, as is its intptr_t
. (Though it is legal for it to be unable to represent -2¹⁵
, this is presumably as a concession to one's-complement machines, which I don't think Rust supports anyway.)
PR https://github.com/rust-lang/rust/pull/49305 includes:
Addition of a couple From
impls that assume that usize
and isize
are always at least 16 bits, on the basis that Rust doesn’t need to be more portable than C99.
Removal of fallible TryFrom
that could be infallible From
impls on only some platforms, with a portability lint. Adding these impls back (one way or another) is tracked at https://github.com/rust-lang/rust/issues/49415
Perhaps all From
andTryFrom
impl's could be conditionally compiled with #[cfg(target_pointer_width=*)]
, and then some mechanism could be added to cargo check
that verifies type checking for the desired supported pointer widths, as configured in Cargo.toml
(and defaulting to 16, 32, and 64 bit)?
Making this work (or at least work efficiently) might require an extension to rustc
, in order to override the target pointer width during a check
pass.
A possible way forward:
Define some new submodules, e.g. std::arch::at_least_32_bits
, std::arch::at_most_64_bits
. These modules would define the implementations of the u32 -> usize
and usize <- u64
conversions. A program that needs these conversions must explicitly import those modules to get them. Those modules aren't available when the target platform doesn't meet the requirements for them. When compiling a crate that makes assumptions about conversions to/from usize, on a target for which those assumptions are invalid, the build will fail pointing directly to the use std::arch::at_least_32_bits;
or use std::arch::at_most_64_bits;
(or whatever) statements, which will make it obvious what the problem is.
No new language features would be required.
Unfortunately, the idea doesn't work because impl
s don't respect module scope like that. A portability lint is the way to go.
Unfortunately, the idea doesn't work because
impl
s don't respect module scope like that. A portability lint is the way to go.
Keep in mind that those modules wouldn't exist for targets that don't meet the limits.
Oh, I see, you're saying that the conversions would still be possible even if the program didn't have the use
statements. That's right. :(
But when they do there's no way to enforce the requirement to import them. The impls are visible regardless. I can't think of a way to do this with imports, but maybe there is some hack with generics and specialization or something.
On Thu, Dec 13, 2018 at 12:11 AM Brian Smith notifications@github.com wrote:
Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go.
Keep in mind that those modules wouldn't exist for targets that don't meet the limits.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rfcs/issues/1748#issuecomment-446844712, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC3n463tqs3icNe7eB3OyT_HZLI1z5yks5u4eFrgaJpZM4J6_rn .
I see that libc::size_t
is defined as type size_t = usize;
which allows implicit conversions between size_t
and usize
, which is an even bigger hazard than explicit conversions between usize
and size_t
. it's been argued that usize
is defined to be equivalent to uintptr_t
and not necessarily equivalent to size_t
. I think we should have impl From<libc::size_t> for usize
and impl From<usize> for libc::uintptr_t
at least. However, I think we also need at least impl From<usize> for libc::size_t
which, in the case where usize
is larger than size_t
, somehow knows how to truncate a usize
that actually represents a size (vs one that represents a pointer) to a size_t
losslessly.
Also note that there are attempts to define a "maximum object size" and so far many people have suggested that isize::max_value()
or usize::max_value()
are appropriate limits there. That would usually be incorrect in the case where uintptr_t
is larger than size_t
. Probably such limits need to be defined relative to ssize_t
and size_t
.
type size_t = usize;
which allows implicit conversions betweensize_t
andusize
There is no conversion here, even implicit. A type
item gives another name to a type. The two names refer to the same type. As far as I know there is no difference with a pub use
reexport.
There is no conversion here, even implicit. A
type
item gives another name to a type. The two names refer to the same type. As far as I know there is no difference with apub use
reexport.
You and I are saying the same thing in different ways. The point is that this works for most, but not all, platforms:
fn foo(n: usize) -> libc::size_t { n }
In https://github.com/rust-lang/unsafe-code-guidelines/issues/99 at least one person claimed that that code isn't guaranteed to work for all targets because sometimes size_t
will not be an alias for usize
. That we can use usize
interchangeably with libc::size_t
on some platforms but not every platforms is in conflict with the trend of the discussion in this issue above, where we don't even allow explicit conversions Into
/From
usize
unless the conversion would work on every platform. It doesn't seem right that we are rejecting some explicit conversions to/from usize
while refusing to provide similar explicit conversions. We should find some way to resolve that inconsistency. My preferred way of removing the consistency is to drop the requirement that usize
is the same as uintptr_t
and instead require usize
is the same as size_t
, which is a breaking change that's unlikely to happen. A more realistic change would be to replace type size_t = usize;
with #[repr(transparent)] struct size_t(usize);
in a new major version of libc.
sometimes
size_t
will not be an alias forusize
I agree that this is incompatible with the way the libc
crate is currently defined.
(This is somewhat besides the point, but what are some platforms where size_t
is not uintptr_t
?)
(This is somewhat besides the point, but what are some platforms where
size_t
is notuintptr_t
?)
A 64-bit CHERI-based platform will have 256-bit or 128-bit pointers and 64-bit usize
. Pointers are a composite of security information and the address. Similarly, any ABI that requires pointers to be represented as (&[T], size_t i)
or equivalent would have uintptr_t
different than usize
.
(Also potentially the ordering of uintptr_t
and usize
is different for the same bit pattern even when they are the same size, because some new security technologies put authentication information in the high bits of pointers.)
I am particularly interested in Rust supporting these security-oriented ABIs in the future as they become practical.
@briansmith
Note that we can only control the maximum allowed size of Rust objects (repr(Rust)
). The maximum allowed size of C objects, which repr(C)
types have to respect, is fixed by the C platform, and is outside our control.
That would usually be incorrect in the case where uintptr_t is larger than size_t.
AFAICT this would only mean that the maximum allowed size of repr(Rust)
values can be greater or equal to the maximum allowed size of repr(C)
values, which is perfectly fine. So what do you mean by "incorrect" ?
So what do you mean by "incorrect" ?
Sure, in theory you could define the maximum object size to be 2**256 - 1
bytes if you want (if uintptr_t
is 256 bits). But I doubt anybody wants that.
Sure, in theory you could define the maximum object size to be 2**256 - 1 bytes if you want (if uintptr_t is 256 bits). But I doubt anybody wants that.
The exact same can be argued of 2**64 - 1
, right? AFAICT these limits only matter if they are small enough for normal Rust code to run into them (e.g. on 8, 16, 32 bit platforms). Once the limits become high enough (e.g. 48-bit or larger), do they still matter ? For example, there is unsafe
code in std
that ensures that these limits aren't reached on 32-bit platforms, but for 64-bit targets it is essentially dead-code that will never be reached in practice (EDIT: not only essentially, libstd just assumes it does not happen: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L735).
I propose that Rust code that is targetting std
(i.e. does not use #![no_std]
) should be able to assume that usize
is at least 32 bits.
When in the course of
humanrusty events, something incore
orstd
depends on the actual width ofusize
/isize
, there are currently (at least) two policies in place:usize
may be as narrow as 8 bits.usize: From<u8> + !From<u16>
usize
is at least 32 bits wide (as it is on all current officially supported platforms).Range<u32>: ExactSizeIterator
Let me know if I missed any other corners of the standard library which make assumptions (identical to one of these or not).
As these policies are in conflict, it seems like one or both of them should be changed. In principle, we can't remove trait implementations from
Range<u32>
and the like, so we could just declaretarget_pointer_width
-liberalism to be the law of the land. However, this will make it difficult to port Rust to a 16-bit system. In doing such porting, trait implementations likeFrom<u32> for usize
andExactSizeIterator for Range<u32>
would need to be gated by a#[cfg]
. But, this would make it difficult to port Rust code from, say, a 32-bit target to a 16-bit target, because some code would stop compiling (N.B. this is already potentially the case, because literals given for enum variants are interpreted asisize
literals).So, what should we do?