Closed RalfJung closed 1 year ago
As usual with bool, the remaining thing that can be bikeshed indefinitely is the interaction with FFI. Can we really assume that any function calling us from C only passes one of two possible bit patterns, on any platform? On which platforms can we be more specific and specify the actual bit patterns? AFAIK false == 0x00 is given by the fact that C allows 0-initialziation of _Bool. Can we say anything about the bit pattern of true?
Note that how _Bool
s are actually passed across C functions is specified in the platform's ABI, and this is technically orthogonal to how _Bool
s are defined in C. For example, the SysV AMD64 ABI states:
When a value of type
_Bool
is returned or passed in a register or on the stack, bit 0 contains the truth value and bits 1 to 7 shall be zero (Other bits are left unspecified, hence the consumer side of those values can rely on it being 0 or 1 when truncated to 8 bit).
So saying that "bool
is equivalent to C's _Bool
" (in general) and saying that "bool
can be passed / returned on C FFI functions using _Bool
" are not technically the same thing. From #53 , it appears to me that T-compiler and T-lang signed on the later, since that is the only definition that removes the need for a c_bool
type.
If that's the case, we have to make the values of true
and false
implementation-defined, and document their values on the platforms that Rust's support.
cc @mjbshaw
Can we really assume that any function calling us from C only passes one of two possible bit patterns, on any platform?
No, we can't assume that. The C standard guarantees that 0x0
is a valid bit-pattern of _Bool
, and that there has to exist another bit-pattern representing true
, but that's about it. An ABI could state:
When a value of type _Bool is returned or passed in a register or on the stack, bit 0 contains the truth value and all other bits are unspecified
That would mean that an implementation has to ignore all other bits, such that 0
and 2
are valid representations of false
, and 1
and 3
are valid representations of true
. From #53 we have to support using bool
on C FFI with those platforms, so we can't really say that there are only two valid bit-patterns for bool
😞
While it's true that calling convention handling of bools needs to be compatible as well for FFI to work, I don't really see how that interacts with the memory representation in the way you suggest. Functions with C ABI can make any necessary adjustments between the memory representation and "argument passing representation" (for example, truncating a bool that's passed in to 8 bits) as part of the call/return handling. This already happens all over the place, e.g.
char
s is passed in a (32b?) register but that doesn't mean that struct isn't three bytes without any paddingSo the only choice/distinction I see here is whether calling convention compatibility is provided in addition to memory layout compatibility (similar to repr(C) newtypes vs repr(transparent) newtypes) without affecting memory layout, and as you say it's pretty clear that's indeed desired.
Strictly speaking, there isn't anything Rust can say about the bit pattern of C's true
(or potentially other possible representations of false
). Pragmatically, the bit pattern 0x01
will always be a valid value representing true
(though not necessarily the only valid value representing true
). Personally, I wouldn't mind seeing Rust take a pragmatic approach to this. A reasonable starting point is:
0x00
is false
. No other bit patterns represent false
.0x01
is true
.Now the only question is whether there are other representations for true
. The next step in the pragmatic approach would be to check if there are any LLVM/GCC targets that support alternative representations for true
. If there are, and if Rust intends to support these targets, then Rust must accept (implementation-defined) alternative representations for true
(or create a c_bool
type for FFI). Conversely, if there are no such targets (that Rust intends to support), then Rust can safely forbid non-zero padding bits for bool
.
For the unsafe code guidelines (UCG), I think it is enough to state that:
Rust's
bool
validity invariant is the same as that of C's_Bool
.Note: that means that
0x0
is always a valid bit-pattern forfalse
. Note: on all platforms that Rust currently supportsbool
has only two valid bit-patterns:0x0
forfalse
and0x1
fortrue
.
We don't really have to handle alternative representations for true
and false
here in any more depth. A formal spec / operational semantics for Rust for platforms in which bool
has more than two valid bit-patterns would need to do that, but this document does not.
Rust's bool validity invariant is the same as that of C's _Bool.
The issue with this is that C has no notion of a "validity invariant".
The issue with this is that C has no notion of a "validity invariant".
A better way to phrase this might be:
Rust's
bool
valid representations are the same as the value representations of C's_Bool
type.
C11 6.2.6.1p5 states:
"Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation."
That is, in C it is ok to materialize a _Bool
representation that does not represent a value of _Bool
, but if that representation participates in an lvalue expression (e.g. read as a _Bool
) then the behavior is undefined. This is not a validity invariant (invalid values of _Bool
are ok as long as the program does not read them as _Bool
s), but is as close as it gets.
Rust's bool valid representations are the same as the value representations of C's _Bool type.
Yeah that sounds more like the kind of terminology I expected.
Presuming we continue to tie Rust bool
to C bool as we have done, then what @gnzlbg wrote makes sense to me.
UPDATE: And, to be clear, I'm not eager to revisit that topic. The lang team decision at the moment seems pretty clear.
Closing as answered
Discussing the validity invariant of booleans.
The obvious invariant is:
true
orfalse
.Is there any reason to allow any other value? In particular, this invariant means that no bit may be uninitialized.
As usual with bool, the remaining thing that can be bikeshed indefinitely is the interaction with FFI. Can we really assume that any function calling us from C only passes one of two possible bit patterns, on any platform? On which platforms can we be more specific and specify the actual bit patterns? AFAIK
false == 0x00
is given by the fact that C allows 0-initialziation of_Bool
. Can we say anything about the bit pattern oftrue
?