Open thomcc opened 2 years ago
Should this instead live in libcore somehow, given that probably libstd may never support platforms where this is not true.
At some point, I'd like to copy all the types from std::os::raw
into core::ffi
. core is compiled for the target system.
This particular proposal doesn't need to block on that, though; these should go into std::os::raw
for now, and we can separately work on populating core::ffi
.
Note that some types may not be the same on all targets for the same Architecture. For example (though I don't believe c_long exists), on x86_64-*-linux-gnu
(or any Sys-V target for x86_64) c_long would be i64, but on x86_64-*-windows-msvc, it would be i32. I believe that, currently, everything in core, with the exception of atomic types, is gated on the architecture only. That may be something to consider.
Even atomic types are effectively gated on target cpu only and not the OS AFAICT. It just so happens that different target triples on the same architecture sometimes use a different target cpu with different levels of atomic support.
Don't atomic types on older arm targets depend on the os as well (since they require OS support to implement properly)? On Thu, Aug 26, 2021 at 11:30 bjorn3 @.***> wrote:
Even atomic types are effectively gated on target cpu only and not the OS AFAICT. It just so happens that different target triples on the same architecture sometimes use a different target cpu.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/88345#issuecomment-906515593, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGLD2ZELKNHU4MLYSPXWGTT6ZM25ANCNFSM5C2A7XKQ .
Modulo bugs, I believe platforms where OS support is required for atomic types are considered not to have atomic types by Rust.
(This seems to be getting off topic, perhaps)
ssize_t
is actually not from stddef.h
but from sys/types.h
which is specific to Unix. On Windows, SSIZE_T
is defined in BaseTsd.h
.
Nice catch. Do you want to submit a PR fixing the docs?
Yes, I'd love to. But I'm not sure if just fixing the doc for ssize_t
is okay. Since ssize_t
is platform-specific as I said, I'd rather suggest removing it from std::os::raw
and possibly placing it under std::os::*::raw
. If so:
std::os::unix::raw
has been deprecated in favor of the libc crate. Maybe we should not add std::os::unix::raw::ssize_t
now.std::os::windows::raw::SSIZE_T
would be okay, but adding only SSIZE_T
among many types defined in BaseTsd.h
might be kind of weird.ssize_t
is available in C at all.What do you think?
Officially, isn't ptrdiff_t
the signed counterpart to size_t
?
Yes. ssize_t is only defined by POSIX, not ISO 9899. It also has a far different minimum required range (though for rust this is really a purely-pedantic argument), with ptrdiff_t's range being more usual for a signed integral type.(+/-(SIZE_MAX/2 -1)).
On Sat, Oct 30, 2021 at 10:07 Chris Denton @.***> wrote:
Officially, isn't ptrdiff_t the signed counterpart to size_t?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/88345#issuecomment-955239890, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGLD236743EXNWWUTLY3XDUJQC27ANCNFSM5C2A7XKQ .
This probably should be changed to either expose ptrdiff_t
instead of ssize_t
, or to only expose size_t
.
Either way, the future of this API depends a lot on https://internals.rust-lang.org/t/pre-rfc-usize-is-not-size-t/15369.
This is hardly scientific, but to quantify my hunch that size_t
is more common than ssize_t
and ptrdiff_t
in my installed headers, I did a check[^1] to see if this is true. The numbers I got were:
Linux:
Mac:
Which probably means that size_t is the important thing here, and the signed version doesn't matter so much.
[^1]: In the spirit of reproducibility, I ran the following command in fish-shell[^2] to get these counts:
```
rg -ce "$pattern" $include_dirs | string replace -r '^.*:' '' | paste -s -d+ - | bc
```
Where:
- `$pattern` is: `\bsize_t\b`, `\bssize_t\b`, or `\bptrdiff_t\b`
- `$include_dirs` on linux was `/usr/local/include` and `/usr/include`.
- `$include_dirs` on mac was `/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include` and `/usr/local/include`. These both might be system dependent in the following ways:
- The first you can find it with something like `xcrun --show-sdk-path --sdk macosx`
- And the second is where brew installs things (which seems to be different on new installs? Or maybe just on M1? IDK).
[^2]: fish-shell
is required because string replace
is easier for me than remembering how to use sed
, but mostly becasue I had something very similar in my history.
Sorry. If you tell me the equivalent `sed` incantation to delete everything prior to the last `:` in a line, I'll update this comment
These are now in core::ffi
too since https://github.com/rust-lang/rust/pull/94503 for #![feature(core_ffi_c)]
.
Just noting that these types are currently missing a reëxport into std. Even if presence of std means c_size_t === usize
, the type should be available to show intent and keep std
's public API a strict superset of core
's (ignoring internal-only perma-unstable items that are only public as an impl detail ofc).
Would we be able to stabilize these or a subset? I know we have the size_t ?= usize
discussion somewhat ongoing, but I don't think this blocks us from providing the C representation, right?
Assuming they are always the same for our current supported targets, we just need to resolve that before adding targets where uintptr_t
and size_t
do differ... I think
I don't believe the current implementation of pub type c_size_t = usize;
is correct if the long-term plans to allow usize != size_t because the current implementation guarantees implicit conversion between usize and c_size_t. I believe that we would instead need to use #[repr(transparent)] struct c_size_t(usize);
and then implement explicitly the operations that we are guaranteeing, and then also implement TryFrom
conversions for the rest. This would require us to decide what guarantees we are making; e.g. usize is always the same size as size_t or larger than size_t, but never smaller?
Or otherwise, we need to document that on targets where usize and size_t are the same, implicit conversions are supported, but on other targets they are not. Or something else to that effect. However, in the past, we have not allowed even EXPLICIT conversions to be supported or not on a per-target basis. For example, there is no impl From<u32> for usize
on targets where usize is the same size (or larger than) u32.
let x: usize = 1;
let y: core::ffi::size_t = x; // Works when a type alias, doesn't work otherwise.
This is true for the other c_*
types. c_int
is a type alias of i32
on most platforms, but it's documented to be platform specific. I think it's fine if we make it clear that pub type c_size_t = usize
; is a "True on this platform only" thing.
This is true for the other c_* types. c_int is a type alias of i32 on most platforms, but it's documented to be platform specific. I think it's fine if we make it clear that pub type c_size_t = usize; is a "True on this platform only" thing.
Was this a historical accident or is this intentional though? IIRC it was intentional to not support the explicit conversions with From
because they wouldn't work across all targets. It would be good to have an explicit decision about these things, e.g. so we could eventually resolve what to do about From<u32>
and whatnot consistently as well.
I think the following table roughly captures current state:
C type | Standard | Rust type | Definition |
---|---|---|---|
size_t |
C89 | core::ffi::c_size_t |
C89: "the unsigned integral type of the result of the sizeof operator" |
ptrdiff_t |
C89 | core::ffi::c_ptrdiff_t |
C89: "the signed integral type of the result of subtracting two pointers" |
intptr_t |
C99 | isize |
C99: "a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer" |
uintptr_t |
C99 | usize |
C99: "an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer" |
ssize_t |
POSIX | core::ffi::c_ssize_t / libc:ssize_t |
POSIX.1-2001: "Used for a count of bytes or an error indication" |
I agree that changing ffi::c_size_t
and ffi::c_ptrdiff_t
to #[repr(transparent)]
structs would be correct, because aside from tradition there's not really anything requiring them to be large enough to store a pointer.
ptr = (u64 caps, u64 addr)
these types would both be u64
rather than u128
.impl From<u16> for c_size_t
and impl From<i16> for c_ptrdiff_t
, but conversion from integer sizes > 16 bits (including usize
and isize
) should be TryFrom
.impl From<c_size_t> for usize
, based on the assumption that Rust doesn't target architectures where value sizes can exceed the address range of pointers.impl From<c_ptrdiff_t> for isize
, because the difference between two pointers can't exceed the size of the pointers themselves.It does seem that Rust is missing type core:ffi::c_intptr_t = isize;
and type core::ffi::c_uintptr_t = usize;
, which aren't strictly necessary but would be great for documentation (e.g. c_size_t
could link them to explain why it's not the same as usize
).
IMO core::ffi::c_ssize_t
should simply be removed. It's specific to the Unix idiom of inline error signaling by returning -1
, and is inherently non-portable to non-hosted environments that core
must support.
Changing them to structs would mean you can't initialize them from integer literals.
I just want to make sure folks in this thread are aware of the Rust for Morello/CHERI Pre-RFC usize
semantics, which would imply size_t
= usize
and ptrdiff_t
= isize
. This would be consistent with strict_provenance
. These want to move away from conflating intptr_t
/uintptr_t
with isize
/usize
, and it seems the main dilemma is between compatibility (keep working without warnings on non-CHERI) versus strictness (warnings or errors for bad conversions even on non-CHERI architectures where they are harmless).
Changing them to structs would mean you can't initialize them from integer literals.
That doesn't seem like a big deal -- the size_t
values are going to be either things passed back from the FFI, or TryFrom
-converted from a native Rust type.
For (hopefully rare?) cases when the C API expects things like buffer sizes to be provided directly, the From<u16>
lets some_c_fn(123u16.into())
work (despite being slightly ugly).
I just want to make sure folks in this thread are aware of the Rust for Morello/CHERI Pre-RFC
usize
semantics, which would implysize_t
=usize
andptrdiff_t
=isize
. This would be consistent withstrict_provenance
. These want to move away from conflatingintptr_t
/uintptr_t
withisize
/usize
, and it seems the main dilemma is between compatibility (keep working without warnings on non-CHERI) versus strictness (warnings or errors for bad conversions even on non-CHERI architectures where they are harmless).
I've seen a lot of "Pre-RFC" threads in the forum that don't go anywhere. Is that a Pre-RFC in the sense that the Rust core language team is onboard with potentially changing the definition of isize
/ usize
and just wants to solicit early feedback? Or in the sense of "[Pre-RFC] wishlist for things that won't happen"?
If the definition of isize
/ usize
changes to match the C semantics of ptrdiff_t
/ size_t
then that would mean that table should be flipped around (c_size_t
and c_ptrdiff_t
as aliases, c_intptr_t
and c_uintptr_t
as transparent structs).
IMO
core::ffi::c_ssize_t
should simply be removed. It's specific to the Unix idiom of inline error signaling by returning-1
, and is inherently non-portable to non-hosted environments thatcore
must support.
I don't really understand this point. In my experience it is a common idiom in C in general, not just Unix. An how is that related to core
? ssize_t
is just the signed version of size_t
, so it will be available on any platform just with a different corresponding Rust type.
I don't really understand this point. In my experience it is a common idiom in C in general, not just Unix. An how is that related to core? ssize_t is just the signed version of size_t, so it will be available on any platform just with a different corresponding Rust type.
ssize_t
is an operating-system-defined thing. I think it may make sense in std::os::raw
but IDK if it makes sense in core::ffi
which (IME) is oriented to completely-OS-agnostic code. Note that we already have core::ffi::c_ptrdiff_t
.
Extremely pedantically speaking, it's somewhat possible to argue that c_size_t
should be a type alias for u64
(or whichever size is appropriate), not usize
. Arithmetic types in C are fairly fungible due to the usual arithmetic conversions[^1], but on an LP64 system, it's typically the case that size_t
C, uint64_t
C, and unsigned long int
C are all the same type. And even in the face of arithmetic conversions, this is observable by anything that asks about compatible types, e.g. _Generic
.
[^1]: At one point I thought _BitInt(N)
was supposed to be exempt from integer conversions, but no, it's only exempt from integer promotion. Promotion is what eagerly turns everything smaller into int
; conversion handles arithmetic between types unequal after promotion and (as if) by assignment to a smaller type.
On the other hand, C differentiates between short
, int
, long
, and long long
, even though at least two of those will probably[^2] have the same width, but Rust says (LP64) that c_long
and c_longlong
are both aliases of i64
, so we already don't really care about which types C differentiates, just about integer width, which is presumably all that the ABI cares about.
[^2]: I believe CHERI/Morello still uses 64 bit long long
for ease of porting and due to ABI constraints, even though theoretically 128 bit long long
would be legal.
Dialing back the pedantry.
I don't think it's possible to stabilize c_size_t
without answering whether usize
is size_t
C, uintptr_t
C, both, or neither. The chart of implication, as I see it, is roughly
c_size_t |
c_uintptr_t |
usize is... |
---|---|---|
unavailable | unavailable | indeterminate |
usize |
unavailable | size_t |
u64 |
unavailable | uintptr_t |
usize |
usize |
both |
u64 |
u64 |
neither |
Without looking specifically at CHERI or W65, I think the most viable options are:
c_size_t
and c_uintptr_t
are both aliases for usize
. Make it clear that these are still target dependent aliases and don't necessarily match, but they're both usize
on most targets.c_size_t
and c_uintptr_t
are both aliases for uNN
, as target appropriate. Deal with the required conversions, like is already done when using e.g. c_int
.c_uintptr_t
is an alias for a new type which acts like an integer for API/ABI but maintains provenance (similar to sptr::uptr
). Probably then define usize
as always being size_t
C.On CHERI, a "uptr
" type must be used to match the behavior of intptr_t
C carrying provenance and having the alignment/size of a pointer (128+1 bits) but only the integer range of vaddr_t
C (64 bits).
Having it be uN unconditionally will make my life easier, since I was planning to just have it expand to a macro called cfg_int!(target_size_width)
(which is a little bit of unstable, compiler-specific shenanigans).
Feature gate:
#![feature(c_size_t)]
.This is a tracking issue for
std::os::raw::{c_size_t, c_ssize_t}
, which are guaranteed to be the same size as the underlying Csize_t
andssize_t
types fromstddef.h
.Currently, on all targets, this is equivalent to
usize
andisize
, however Rust has historically gone somewhat out of its way to avoid promising this. There are some targets with vaguely-planned support where this is not true (W65, used for SNES homebrew, for example)Further reading here is available:
And probably more.
Public API
Steps / History
Unresolved Questions
usize
andsize_t
are the same? See https://internals.rust-lang.org/t/pre-rfc-usize-is-not-size-t/15369libcore
somehow, given that probablylibstd
may never support platforms where this is not true. (This may not be true, since I imagine CHERI will support libstd)size_t
is more common in function signatures.