Closed gnzlbg closed 5 years ago
as
will always make true as INTEGRAL_TYPE == 1
, to be clear, no matter what the representation of true
is (and the same for false)
@ubsan yes, I should have used transmute
, updated.
Those are stated to be "proposals". I believe the C++ proposal might have been accepted by the C++ committee, but I don't know if the C committee accepted the proposal yet.
The author of those proposals tweeted (really, this is the only source I have) about this proposal: https://twitter.com/jfbastien/status/989242576598327296?lang=en
Not only is the C++ standards committee happy to make signed integers two's complement in C++20 with http://wg21.link/p0907 , the C standards committee voted to do the same thing for the next version of C! 🤯 I just have to write the wording. 🎉
I don't know which parts of p0907 and N2218 will make it into C and C++, but at least some of it is on track for C++20 and the next revision of the C standard.
What is the problem with just saying "bool
has the same size as C's _Bool
, here is what this means for popular platforms"? "bool
== _Bool
" and "bool
has size 1" are only contradicting if they are both interpreted as normative for all platforms; I do not think the latter is normative.
TBH I am a bit surprised to see so much fuzz about this, I feel there is an aspect to this discussion that I am missing.
The one possible problem with this definition is that it means we can only ever port Rust to platforms that support C; however, I think that is an entirely theoretical concern at this point.
Defining bool == _Bool
has no implications on c-less platforms, as it is implicitly "if we have a C platform to interoperate with".
I don't have much respect for the boogeyman of weird bools, since aiui that's mostly CHAR_BIT != 8
or wildly ancient platforms (win 95). We should blindly assert bool has the proposed c/cpp-20 definition, and also that we only interoperate with c platforms that support it.
I might be willing to accept a weaker statement of "we don't interoperate with c's bool on such a weird platform" but that pushes people back to spreading the folklore that c_bool is useful and good.
What is the problem with just saying "bool has the same size as C's _Bool, here is what this means for popular platforms"?
All options have problems. The problem with that approach is that it doesn't guarantee that Rust bool
is 1 byte, so those who might care about that might be left wondering: "Should I use u8
to store my booleans instead of bool
just in case someone uses my code in some hypothetical platform in which size_of::<bool>() != 1
? " Note that this something that affects all Rust code, instead of just Rust's C FFI.
If we would say that size_of::<bool>() == 1
then some people might decide to use c_bool
in C FFI to be portable with hypothetical C platforms in which that does not hold. That looks like a lesser evil to me than leaving all Rust users, including those writing Rust code that does not interface with C, wondering about the size of bool
.
just in case someone uses my code in some hypothetical platform in which
size_of::<bool>() != 1
fn _static_assert_size_of() {
let _ = std::mem::transmute<bool, u8>;
}
@SimonSapin preventing code from compiling is not the same producing portable code that compiles and works as expected.
An equivalent solution (that prevents some code from compiling) would be to not allow bool
in C FFI in those platforms in which bool != _Bool
. That would only affect Rust code doing C FFI, instead of affecting all Rust code, and bool
would still continue to work properly on C FFI as long as the code only targets platforms in which bool == _Bool
.
The problem with that approach is that it doesn't guarantee that Rust bool is 1 byte,
The lang and compiler teams decided that this is not the case (i.e., they decided that bool
might have size different from 1)! It is not our place to change that decision. We are mostly documenting behavior here.
This was actually called out in https://github.com/rust-lang/rust/pull/46176#issuecomment-359593446:
We document this, and also document that on every platform we currently support, this means that the size of bool is 1.
(emphasis mine)
I see no reason for us to overthrow their decision as part of this documentation process. If you want to change that decision, feel free to write an RFC, but that IMO has no place in the current UCG discussion phase.
@SimonSapin Here is a version of that code that actually compiles: (from your very brief message, I had no idea whether you wanted to express that this is accepted or rejected by the compiler^^)
fn static_assert_size_of() { unsafe {
let _ = std::mem::transmute::<bool, u8>(true);
} }
Looks like a bug to me. We also don't allow transmuting between u64
and usize
on 64bit platforms. So we should reject this transmute
for the same reason, or else decide that we will never support a platform where _Bool
has a size different from 1, or else backtrack on https://github.com/rust-lang/rust/pull/46176#issuecomment-359593446. "We", however, is the community as part of an RFC, or maybe T-lang + T-compiler, but not the UCG group.
We also don't allow transmuting between u64 and usize on 64bit platforms.
Huh? Of course we do. It compiles, and it's perfectly legitimate if you only target 64 bit platforms or properly cfg
-gate the transmute.
Huh? Of course we do. It compiles, and it's perfectly legitimate if you only target 64 bit platforms or properly cfg-gate the transmute.
You are right. Seems I misremembered. Even better, things are consistent then.
A long discussion with @gnzlbg on Zulip uncovered why I am so confused here (in particular, I maintain that it makes no logical sense to bring up platforms with CHAR_BITS != 8
in this discussion if we declared such platforms unsupported): @gnzlbg thinks that even if we say "for FFI we assume that CHAR_SIZE == 8
", we still want to say some things about FFI on platforms where CHAR_SIZE != 8
.
I think we should close this issue, and just document the T-lang + T-compiler decision that bool
shares size and alignment with _Bool
. And then some people that care should have a discussion about which platforms we support, where "no support" means "no support". And then, if there is agreement that it is worth supporting CHAR_BITS != 8
at least a bit, we can see how that affects bool
.
(Code in my previous comment did not compile because I forgot ::
in the turbofish syntax. But you don’t need to call transmute
to get the magic size equality check, only to "instantiate" it with concrete input and output types.)
I am still interested in knowing what concrete platforms people want to actually support with more flexible bools.
I am still interested in concrete platforms that people want to actually support with more flexible bools.
Nobody came up with any relevant ones in the last discussions - I mentioned that MSVC <= 4.2 had sizeof(bool) == 4
but that is not a platform that I'd want to support.
Historically, 32-bit powerpc-darwin
was another one with sizeof(bool) == 4
.
Cray didn't have byte-level addressing, but I don't think anyone needs to target the Cray architecture anymore.
I think we should mandate that bool
is always the platform's _Bool
. And to the extent we need to, let's always assume 8-bit bytes, rather than spending time and resources future-proofing against a platform that doesn't yet exist.
I think we should mandate that bool is always the platform's _Bool.
What does that buy us?
Pros of bool == _Bool
:
c_bool
on C FFI out of compatibility concerns with weird targetsCons of bool == _Bool
:
unsafe
code cannot rely on size_of::<bool>() == 1
if it wants to be "truly portable" unsafe
code might not be able to rely on transmute::<_, i32>(true) == 1
and transmute::<_, i32>(false) == 0
if it wants to be truly portable (portable to C99 platforms for example)struct Bool(u8)
for use in Rust out of portability paranoiaOn one hand, I think that we should not make it impossible to write Rust that targets weird C platforms - there should not be room or excuses for people to use C. On the other hand, targeting weird C platforms is going to be hard anyways, most of the ecosystem might not be reused at least "as is" on those, etc. Is it worth it to leave most Rust users wondering about bool
to make targeting these weird C platforms a slightly bit easier? I don't know. Can, e.g., -sys
Rust C FFI wrappers be reused at all on those weird targets ? I don't know either.
As @rkruppe argued, just because we leave those things as "implementation defined" does not mean that people won't assume that they hold everywhere if that holds on all platforms they care about. I don't care about platforms where _Bool
doesn't have the properties above, so I just can assume that they hold. The bool == _Bool
definition allows that, so as long as we word everything correctly, it's a good choice IMO.
To clarify, c_bool
would become a legitimately useful type if we ever did add support for a "weird C platform"/platform with large bools, right?
So as long as there's a fundamental conflict between guaranteeing the size on weird future platforms and guaranteeing C interop on weird future platforms, and "targeting weird C platforms is going to be hard anyways", it seems we gain a lot more from making bool
predictably small for all the ordinary pure-Rust use cases and then keeping in mind that we'll have to introduce an official c_bool
type if we ever add a platform where it would be useful (and we probably shouldn't do that before the "portability lint" becomes a real thing).
@Ixrec the team has decided that it's better to be compatible with C, than predictably 1 byte, with bit patterns exactly 0b0
and 0b1
(although there's a close to zero percent chance that this decision ever gets tested)
What should miri do here? It currently implements an abstract machine where bool is one Byte wide and 0x1 and 0x0 are used for true and false but should it panic on any sort of “bool type punning” ? The “host” and the “target” can vary in behavior so this could impact const fn.
We already have similar problems with floats, but I don’t know if they are the same.
regardless of our decision, bool's sizing is no different from usize vs u32/u64, which I don't believe is considered an "interesting" problem in miri
What should miri do here? It currently implements an abstract machine where bool is one Byte wide
No, it implements an abstract machine where bool
is as large as it is on the target platform. Sizes and layout of types in miri is entirely determined by rustc, using the same code as what is used for codegen.
and 0x1 and 0x0 are used for true and false
That's correct. But it seems that'll soon be fixed in C/C++ as well so it seems fine.
We discussed this briefly in today's meeting, so I think we can close this. We agreed on writing down something along these lines (the exact wording will be in a PR, we can discuss the nits there - cc @avadacatavra ):
Rust's bool
has the same layout as C17's _Bool
, that is, its size and alignment are implementation-defined.
note: on all platforms that Rust's currently supports, the size and alignment of bool
are 1, and its ABI class is INTEGER
.
Rust's bool
has the same valid representations as C17's _Bool
, that is, two valid implementation-defined bit-patterns corresponding to true
and false
- all other bit-patterns are invalid.
note: on all platforms that Rust's currently supports, 0x0
is the bit-pattern of false
, and 0x1
is the bit-pattern of true
- all other bit-patterns are invalid.
note: there are two proposals, N2218 for C and P0907 for C++, which propose defining0x0
as the bit-pattern for false
and 0x1
as the bit-pattern for true
in the C and C++ standards.
We didn't say anything about the safety invariant of bool
, but I don't think there is much to say about that yet. It was also suggested that we should warn on unsafe
code that makes assumptions about, at least, the size and alignment of bool
. We might need to add a note about this, or maybe open a rust-lang/rust issue about doing that ?
There was interest in documenting the "controversy" here but there was some agreement that there isn't really much controversy about this, just different goals that are at tension.
EDIT: briefly about the goals at tension: the unsafe code guidelines answer the question "What is unsafe
code allowed to assume about bool
? ". From the point-of-view of answering this question, the goal would be a simple and clear answer, like: bool
has size 1, 0x1
is true
and 0x0
is false. Otherwise, unsafe
code that assumes any of this is "wrong" in some sense, and people might end up writing more complicated unsafe
code to avoid relying on these assumptions.
The T-lang and T-compiler team considered many goals, like "people should use bool
in C FFI instead of c_bool
", "people should use bool
in Rust instead of u8
", etc. The bool == _Bool
decision balances all these goals. In practice, for users that write code (safe or unsafe
) that aims to target "mainstream" platforms, the distinction does not really matter.
The T-compiler and T-lang teams signed off here, that
where on every platform that Rust currently supports this implies that:
bool
has a size and an alignment of1
,true as i32 == 1
andfalse as i32 == 0
, EDIT: that is always true, the valid bit patterns ofbool
is what matters here, e.g., on a platform wherebool
has the same size asi8
, whethertransmute::<_, i8>(true) == 1
andtransmute::<_, i8>(false) == 0
These two properties are not guaranteed by Rust, and
unsafe
code cannot rely on these. In the last UCG WG meeting it was unclear whether we want to guarantee these two properties or not. As @rkruppe pointed out, this would be guaranteeing something different and incompatible with what T-lang and T-compiler guaranteed.The following arguments have been raised (hope I did not miss any):
T-lang and T-compiler did not specify which version of the C standard
_Bool
conforms to. In C++20 and C20, P0907r4 (C++) and N2218 (C) specify that:bool
and_Bool
contain no padding bits (only value bits),1 == (int)true
and0 == (int)false
.In some of the merged PRs of the UCG we have already specified that the platform's C implementation needs to comply with some, e.g., C17 or "latest" C standard properties (e.g. for
repr(C) struct
layout). If we end up requiring C20 for representation / validity ofrepr(C)
, we end up guaranteeing these properties. AFAICT the only property aboutbool
that would remain as implementation-defined is its size and alignment.In https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9 / https://github.com/rust-rfcs/unsafe-code-guidelines/pull/49 , we ended up requiring that C's
CHAR_BITS == 8
. This implies that ifCHAR_BITS != 8
thenbool
cannot have the same representation as_Bool
. Some stakeholders still wanted to be able to do C FFI with these platforms, e.g. @BrianSmith suggested that Rust should diagnose, e.g., usingbool
on FFI on these platforms (https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9#issuecomment-428018003), but that interfacing with those platforms via C FFI (e.g. against assembly) should still be possible (e.g. in a DSP whereCHAR_BITS == 16
passing au16
to C / assembly / ... expecting achar
or 16 bit integer should be doable).What exactly T-lang and T-compiler actually ended up guaranteeing isn't 100% clear. In https://github.com/rust-lang/rust/pull/46176 the decision seems to be that
bool == _Bool
, but the PR that was actually merged only mentions thatbool
has a size of1
: https://github.com/rust-lang/rust/pull/46156. This might be an oversight in the docs, and some have mention that the reference is "non-normative". @BrianSmith pointed out (here and here) thatbool
ABI (e.g. integer class or something else?), alignment, bit patterns denotingtrue
andfalse
, etc. don't appear to be properly documented. @Gankro summarized the status quo in Rust Layout and ABIs document and mentioned that projects like Firefox rely on these extra guarantees for correctness (e.g. forbindgen
, etc. to work properly, see here.There are a couple of comments by @withoutboats that I think show both T-lang and T-compiler's rationale and the spirit behind their decision, here:
and here:
So even if the docs say that
bool
has a size of1
, and that's it, I believe that this last comment shows that the spirit of T-lang and T-compiler decision was to spare people from creating ac_bool
type to be forward compatible on C FFI with platforms that we might never properly support.I think that the open questions that have to be clarified, are:
_Bool
, or do we want to stay backwards compatible with C99/11/17 ? (in this case, people can only rely on, e.g.,true == 1
on targets where the platform implementation is C20 "conforming" at least w.r.t._Bool
)bool
has a size and an alignment of1
? (in the hypothetical case that we ever support a platform where this is not the case, we could raise animproper_ctype
warning / error on the platform, or some other form of diagnostic, as @BrianSmith suggested). This would be a change incompatible withbool == _Bool
, might lead people to create and use ac_bool
type, etc.cc @rkruppe @withoutboats @briansmith @gankro @joshtriplett @cuviper @whitequark @est31 @SimonSapin