rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
94.87k stars 12.23k forks source link

Tracking Issue for `#![feature(c_size_t)]` (`std::os::raw::c_size_t`/`std::os::raw::c_ssize_t`) #88345

Open thomcc opened 2 years ago

thomcc commented 2 years ago

Feature gate: #![feature(c_size_t)].

This is a tracking issue for std::os::raw::{c_size_t, c_ssize_t}, which are guaranteed to be the same size as the underlying C size_t and ssize_t types from stddef.h.

Currently, on all targets, this is equivalent to usize and isize, however Rust has historically gone somewhat out of its way to avoid promising this. There are some targets with vaguely-planned support where this is not true (W65, used for SNES homebrew, for example)

Further reading here is available:

And probably more.

Public API

// std::os::raw

pub type c_size_t = usize;
pub type c_ssize_t = isize;

Steps / History

Unresolved Questions

joshtriplett commented 2 years ago

Should this instead live in libcore somehow, given that probably libstd may never support platforms where this is not true.

At some point, I'd like to copy all the types from std::os::raw into core::ffi. core is compiled for the target system.

This particular proposal doesn't need to block on that, though; these should go into std::os::raw for now, and we can separately work on populating core::ffi.

chorman0773 commented 2 years ago

Note that some types may not be the same on all targets for the same Architecture. For example (though I don't believe c_long exists), on x86_64-*-linux-gnu (or any Sys-V target for x86_64) c_long would be i64, but on x86_64-*-windows-msvc, it would be i32. I believe that, currently, everything in core, with the exception of atomic types, is gated on the architecture only. That may be something to consider.

bjorn3 commented 2 years ago

Even atomic types are effectively gated on target cpu only and not the OS AFAICT. It just so happens that different target triples on the same architecture sometimes use a different target cpu with different levels of atomic support.

chorman0773 commented 2 years ago

Don't atomic types on older arm targets depend on the os as well (since they require OS support to implement properly)? On Thu, Aug 26, 2021 at 11:30 bjorn3 @.***> wrote:

Even atomic types are effectively gated on target cpu only and not the OS AFAICT. It just so happens that different target triples on the same architecture sometimes use a different target cpu.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/88345#issuecomment-906515593, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGLD2ZELKNHU4MLYSPXWGTT6ZM25ANCNFSM5C2A7XKQ .

thomcc commented 2 years ago

Modulo bugs, I believe platforms where OS support is required for atomic types are considered not to have atomic types by Rust.

(This seems to be getting off topic, perhaps)

magicant commented 2 years ago

ssize_t is actually not from stddef.h but from sys/types.h which is specific to Unix. On Windows, SSIZE_T is defined in BaseTsd.h.

thomcc commented 2 years ago

Nice catch. Do you want to submit a PR fixing the docs?

magicant commented 2 years ago

Yes, I'd love to. But I'm not sure if just fixing the doc for ssize_t is okay. Since ssize_t is platform-specific as I said, I'd rather suggest removing it from std::os::raw and possibly placing it under std::os::*::raw. If so:

What do you think?

ChrisDenton commented 2 years ago

Officially, isn't ptrdiff_t the signed counterpart to size_t?

chorman0773 commented 2 years ago

Yes. ssize_t is only defined by POSIX, not ISO 9899. It also has a far different minimum required range (though for rust this is really a purely-pedantic argument), with ptrdiff_t's range being more usual for a signed integral type.(+/-(SIZE_MAX/2 -1)).

On Sat, Oct 30, 2021 at 10:07 Chris Denton @.***> wrote:

Officially, isn't ptrdiff_t the signed counterpart to size_t?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/88345#issuecomment-955239890, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGLD236743EXNWWUTLY3XDUJQC27ANCNFSM5C2A7XKQ .

thomcc commented 2 years ago

This probably should be changed to either expose ptrdiff_t instead of ssize_t, or to only expose size_t.

Either way, the future of this API depends a lot on https://internals.rust-lang.org/t/pre-rfc-usize-is-not-size-t/15369.

thomcc commented 2 years ago

This is hardly scientific, but to quantify my hunch that size_t is more common than ssize_t and ptrdiff_t in my installed headers, I did a check[^1] to see if this is true. The numbers I got were:

Linux:

Mac:

Which probably means that size_t is the important thing here, and the signed version doesn't matter so much.

[^1]: In the spirit of reproducibility, I ran the following command in fish-shell[^2] to get these counts:

```
rg -ce "$pattern" $include_dirs | string replace -r '^.*:' '' | paste -s -d+ - | bc
```
Where:
- `$pattern` is: `\bsize_t\b`, `\bssize_t\b`, or `\bptrdiff_t\b`
- `$include_dirs` on linux was `/usr/local/include` and `/usr/include`.
- `$include_dirs` on mac was `/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include` and `/usr/local/include`. These both might be system dependent in the following ways:
    - The first you can find it with something like `xcrun --show-sdk-path --sdk macosx`
    - And the second is where brew installs things (which seems to be different on new installs? Or maybe just on M1? IDK).

[^2]: fish-shell is required because string replace is easier for me than remembering how to use sed, but mostly becasue I had something very similar in my history.

Sorry. If you tell me the equivalent `sed` incantation to delete everything prior to the last `:` in a line, I'll update this comment
ojeda commented 2 years ago

These are now in core::ffi too since https://github.com/rust-lang/rust/pull/94503 for #![feature(core_ffi_c)].

CAD97 commented 1 year ago

Just noting that these types are currently missing a reëxport into std. Even if presence of std means c_size_t === usize, the type should be available to show intent and keep std's public API a strict superset of core's (ignoring internal-only perma-unstable items that are only public as an impl detail ofc).

tgross35 commented 7 months ago

Would we be able to stabilize these or a subset? I know we have the size_t ?= usize discussion somewhat ongoing, but I don't think this blocks us from providing the C representation, right?

Assuming they are always the same for our current supported targets, we just need to resolve that before adding targets where uintptr_t and size_t do differ... I think

briansmith commented 7 months ago

I don't believe the current implementation of pub type c_size_t = usize; is correct if the long-term plans to allow usize != size_t because the current implementation guarantees implicit conversion between usize and c_size_t. I believe that we would instead need to use #[repr(transparent)] struct c_size_t(usize); and then implement explicitly the operations that we are guaranteeing, and then also implement TryFrom conversions for the rest. This would require us to decide what guarantees we are making; e.g. usize is always the same size as size_t or larger than size_t, but never smaller?

Or otherwise, we need to document that on targets where usize and size_t are the same, implicit conversions are supported, but on other targets they are not. Or something else to that effect. However, in the past, we have not allowed even EXPLICIT conversions to be supported or not on a per-target basis. For example, there is no impl From<u32> for usize on targets where usize is the same size (or larger than) u32.

let x: usize = 1;
let y: core::ffi::size_t = x;  // Works when a type alias, doesn't work otherwise.
chorman0773 commented 7 months ago

This is true for the other c_* types. c_int is a type alias of i32 on most platforms, but it's documented to be platform specific. I think it's fine if we make it clear that pub type c_size_t = usize; is a "True on this platform only" thing.

briansmith commented 7 months ago

This is true for the other c_* types. c_int is a type alias of i32 on most platforms, but it's documented to be platform specific. I think it's fine if we make it clear that pub type c_size_t = usize; is a "True on this platform only" thing.

Was this a historical accident or is this intentional though? IIRC it was intentional to not support the explicit conversions with From because they wouldn't work across all targets. It would be good to have an explicit decision about these things, e.g. so we could eventually resolve what to do about From<u32> and whatnot consistently as well.

jmillikin commented 7 months ago

I think the following table roughly captures current state:

C type Standard Rust type Definition
size_t C89 core::ffi::c_size_t C89: "the unsigned integral type of the result of the sizeof operator"
ptrdiff_t C89 core::ffi::c_ptrdiff_t C89: "the signed integral type of the result of subtracting two pointers"
intptr_t C99 isize C99: "a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer"
uintptr_t C99 usize C99: "an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer"
ssize_t POSIX core::ffi::c_ssize_t / libc:ssize_t POSIX.1-2001: "Used for a count of bytes or an error indication"

I agree that changing ffi::c_size_t and ffi::c_ptrdiff_t to #[repr(transparent)] structs would be correct, because aside from tradition there's not really anything requiring them to be large enough to store a pointer.

It does seem that Rust is missing type core:ffi::c_intptr_t = isize; and type core::ffi::c_uintptr_t = usize;, which aren't strictly necessary but would be great for documentation (e.g. c_size_t could link them to explain why it's not the same as usize).

IMO core::ffi::c_ssize_t should simply be removed. It's specific to the Unix idiom of inline error signaling by returning -1, and is inherently non-portable to non-hosted environments that core must support.

chorman0773 commented 7 months ago

Changing them to structs would mean you can't initialize them from integer literals.

jedbrown commented 7 months ago

I just want to make sure folks in this thread are aware of the Rust for Morello/CHERI Pre-RFC usize semantics, which would imply size_t = usize and ptrdiff_t = isize. This would be consistent with strict_provenance. These want to move away from conflating intptr_t/uintptr_t with isize/usize, and it seems the main dilemma is between compatibility (keep working without warnings on non-CHERI) versus strictness (warnings or errors for bad conversions even on non-CHERI architectures where they are harmless).

jmillikin commented 7 months ago

Changing them to structs would mean you can't initialize them from integer literals.

That doesn't seem like a big deal -- the size_t values are going to be either things passed back from the FFI, or TryFrom-converted from a native Rust type.

For (hopefully rare?) cases when the C API expects things like buffer sizes to be provided directly, the From<u16> lets some_c_fn(123u16.into()) work (despite being slightly ugly).

I just want to make sure folks in this thread are aware of the Rust for Morello/CHERI Pre-RFC usize semantics, which would imply size_t = usize and ptrdiff_t = isize. This would be consistent with strict_provenance. These want to move away from conflating intptr_t/uintptr_t with isize/usize, and it seems the main dilemma is between compatibility (keep working without warnings on non-CHERI) versus strictness (warnings or errors for bad conversions even on non-CHERI architectures where they are harmless).

I've seen a lot of "Pre-RFC" threads in the forum that don't go anywhere. Is that a Pre-RFC in the sense that the Rust core language team is onboard with potentially changing the definition of isize / usize and just wants to solicit early feedback? Or in the sense of "[Pre-RFC] wishlist for things that won't happen"?

If the definition of isize / usize changes to match the C semantics of ptrdiff_t / size_t then that would mean that table should be flipped around (c_size_t and c_ptrdiff_t as aliases, c_intptr_t and c_uintptr_t as transparent structs).

Sympatron commented 7 months ago

IMO core::ffi::c_ssize_t should simply be removed. It's specific to the Unix idiom of inline error signaling by returning -1, and is inherently non-portable to non-hosted environments that core must support.

I don't really understand this point. In my experience it is a common idiom in C in general, not just Unix. An how is that related to core? ssize_t is just the signed version of size_t, so it will be available on any platform just with a different corresponding Rust type.

briansmith commented 7 months ago

I don't really understand this point. In my experience it is a common idiom in C in general, not just Unix. An how is that related to core? ssize_t is just the signed version of size_t, so it will be available on any platform just with a different corresponding Rust type.

ssize_t is an operating-system-defined thing. I think it may make sense in std::os::raw but IDK if it makes sense in core::ffi which (IME) is oriented to completely-OS-agnostic code. Note that we already have core::ffi::c_ptrdiff_t.

CAD97 commented 7 months ago

Extremely pedantically speaking, it's somewhat possible to argue that c_size_t should be a type alias for u64 (or whichever size is appropriate), not usize. Arithmetic types in C are fairly fungible due to the usual arithmetic conversions[^1], but on an LP64 system, it's typically the case that size_tC, uint64_tC, and unsigned long intC are all the same type. And even in the face of arithmetic conversions, this is observable by anything that asks about compatible types, e.g. _Generic.

[^1]: At one point I thought _BitInt(N) was supposed to be exempt from integer conversions, but no, it's only exempt from integer promotion. Promotion is what eagerly turns everything smaller into int; conversion handles arithmetic between types unequal after promotion and (as if) by assignment to a smaller type.

On the other hand, C differentiates between short, int, long, and long long, even though at least two of those will probably[^2] have the same width, but Rust says (LP64) that c_long and c_longlong are both aliases of i64, so we already don't really care about which types C differentiates, just about integer width, which is presumably all that the ABI cares about.

[^2]: I believe CHERI/Morello still uses 64 bit long long for ease of porting and due to ABI constraints, even though theoretically 128 bit long long would be legal.


Dialing back the pedantry.

I don't think it's possible to stabilize c_size_t without answering whether usize is size_tC, uintptr_tC, both, or neither. The chart of implication, as I see it, is roughly

c_size_t c_uintptr_t usize is...
unavailable unavailable indeterminate
usize unavailable size_t
u64 unavailable uintptr_t
usize usize both
u64 u64 neither

Without looking specifically at CHERI or W65, I think the most viable options are:

On CHERI, a "uptr" type must be used to match the behavior of intptr_tC carrying provenance and having the alignment/size of a pointer (128+1 bits) but only the integer range of vaddr_tC (64 bits).

chorman0773 commented 7 months ago

Having it be uN unconditionally will make my life easier, since I was planning to just have it expand to a macro called cfg_int!(target_size_width) (which is a little bit of unstable, compiler-specific shenanigans).