rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
650 stars 57 forks source link

What about: "odd" pointer sizes #255

Open RalfJung opened 3 years ago

RalfJung commented 3 years ago

@chorman0773 brings up the issue of platforms where pointers have "strange" sizes, such as 3 bytes.

if Rust wants to support such platforms, interesting questions arise:

I am probably also missing some aspects of this due to a lack of familiarity with such platforms. :) This seems somewhat related to https://github.com/rust-lang/unsafe-code-guidelines/issues/29 in that it is about "strange platforms".

Lokathor commented 3 years ago

size is probably 4 for alignment reasons

chorman0773 commented 3 years ago

On 65816 at least, according to the definition I have used and written, size_of::<*const ()>() (and likewise in C, sizeof(void*)) is 4. This is because the alignment (which needs to be a power of two) is 4, so that it's not possible to allocate a pointer across a bank (which can break things entirely, since the CPU can access the entire pointer, and may possibly wrap at bank boundaries). Pointers are 0 extended by the ABI (and the compiler is recommended to do so as well), for various reasons (it also allows pointer arithmetic to be done the 32-bit functions when the compiler doesn't inline the software addition, which would be trivial, though requires two additions of 16-bits each)

As for usize, ideally it would be a u24 zero-extended to u32, likewise for isize (but sign-extended as well). However, I remember it being mentioned that integer types have no padding and no invalid values (except for uninit, but see #71). I interpreted this to mean that every possible bit-pattern represents a distinct valid value of the type. If this extends to usize, and likewise to isize, then this would be incompatible with the requirements on pointers, because as mentioned on #76, usize::MAX as *const () is safe (and can be dereferenced soundly*).

(*On lccc, I would prefer that, even for accesses of size 0, it needs to point to an object. However because lccc has rules that allow it to invent objects, this can be trivially worked around)

sollyucko commented 3 years ago

What about e.g. eZ80, which AFAIK has 24-bit addresses and no alignment requirements?

RalfJung commented 3 years ago

I guess we could imagine the last byte of pointers and usize to be padding?

However, I remember it being mentioned that integer types have no padding

They don't on normal platforms, but for weird things like 3-byte integers the best option might me to make an exception...

chorman0773 commented 3 years ago

They don't on normal platforms

A possible good idea would be to specify that the normal integer types un (n∈{8,16,32,64,128}) have a size equal to n/8, and every initialized bit pattern represents a distinct valid value, and leave it unspecified for usize (resp. for in and isize). It would also leave rust open to adding extended integer types (IE. un, n∈Z+/{8,16,32,64,128}), which would suffer the same issue as usize here. In the case of usize, whether or not any padding bits exist, the value of those padding bits, and whether or not particular padding bits are valid, would be unspecified (IE. they can be signed or zero extended, or left indeterminate/uinitialized).

What about e.g. eZ80, which AFAIK has 24-bit addresses and no alignment requirements?

On 65816 the alignment requirement is imposed by the ABI (that is, there is no hardware level alignment on it either). It was done because pointers are a "scalar unit" which can be accessed as one value by the CPU, and sometimes, those accesses can wrap at the bank bounderies (every 64kiB), so I have to ensure no scalar unit can be allocated accross a bank. If the same problem arises there, the eZ80 target could act similarily to how I've defined it for the 65816.

Diggsey commented 3 years ago

One problem with {u,i}size being 24-bits with the top 8 bits being zero is that arithmatic will be slower on those types.

chorman0773 commented 3 years ago

I mean, for usize it's a simple masked aritmetic. And for isize it's the same followed by a sign extension.

RalfJung commented 3 years ago

In the case of usize, whether or not any padding bits exist, the value of those padding bits, and whether or not particular padding bits are valid, would be unspecified (IE. they can be signed or zero extended, or left indeterminate/uinitialized).

The disadvantage of this idea is that now all code has to carry the cost of supporting exotic platforms. All unsafe code working with isize/usize has to be carefully audited to take into account the possibility of padding. In the past, some lang team members have expressed a preference to avoid such situations. (IIRC that was when people brought up DSPs where the smallest addressable unit is 16bit in size.) At some point, platforms are so niche to simply not warrant the cost this imposes.

This could be implementation-defined though, i.e. rustc could guarantee absence of padding for those targets where the pointer size is a power of two. That makes life easier for unsafe code authors on the majority of platforms, at the cost of people working on exotic platforms (they need to have special audits for unsafe code). Frankly, this is likely what would happen anyway, even of we officially left things unspecified -- I doubt most unsafe code authors are even aware that such platforms exist.

chorman0773 commented 3 years ago

I wouldn't be opposed to the increased difficulty, provided its not disproportionate to the cost of shifting the burden to unsafe developers, and rust can be written on these platforms. In particular I would really benefit from being able to use rust as a frontend language for SNES-Dev, a toolchain I am writing for compiling SNES homebrew, which is a 65816 target (and the origin of the ABI in question, though it can be generalized to any 65816 platform).