rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
666 stars 58 forks source link

bool == _Bool ? #53

Closed gnzlbg closed 5 years ago

gnzlbg commented 5 years ago

The T-compiler and T-lang teams signed off here, that

bool has the same representation as _Bool

where on every platform that Rust currently supports this implies that:

These two properties are not guaranteed by Rust, and unsafe code cannot rely on these. In the last UCG WG meeting it was unclear whether we want to guarantee these two properties or not. As @rkruppe pointed out, this would be guaranteeing something different and incompatible with what T-lang and T-compiler guaranteed.

Note: any change that the UCG WG proposes will have to go through the RFC process anyways, were it might be rejected. This issue is being raised with stakeholders to evaluate whether there is something that needs changing or not, and if so, whether the change is possible, has chances to achieve consensus, etc.

The following arguments have been raised (hope I did not miss any):

There are a couple of comments by @withoutboats that I think show both T-lang and T-compiler's rationale and the spirit behind their decision, here:

  • I worry that if we don't specify bool as equivalent to C and C++ booleans, people will need to use c_bool in FFI to be cross platform compatible.

  • I worry that if we don't specify bool as byte sized, people will create a struct Bool(u8) to get that guarantee & keep their structs small.

and here:

People could come to the conclusion that they need a c_bool type for their FFI to be forward compatible with platforms we don't yet support. I think defining it as the same representation as _Bool / C++ bool makes it the least likely someone does something painful to avoid entirely hypothetical problems.

So even if the docs say that bool has a size of 1, and that's it, I believe that this last comment shows that the spirit of T-lang and T-compiler decision was to spare people from creating a c_bool type to be forward compatible on C FFI with platforms that we might never properly support.


I think that the open questions that have to be clarified, are:

cc @rkruppe @withoutboats @briansmith @gankro @joshtriplett @cuviper @whitequark @est31 @SimonSapin

briansmith commented 5 years ago
  • n C++20 and C20, P0907r4 (C++) and N2218 (C) specify that:

Those are stated to be "proposals". I believe the C++ proposal might have been accepted by the C++ committee, but I don't know if the C committee accepted the proposal yet.

strega-nil commented 5 years ago

as will always make true as INTEGRAL_TYPE == 1, to be clear, no matter what the representation of true is (and the same for false)

gnzlbg commented 5 years ago

@ubsan yes, I should have used transmute, updated.

gnzlbg commented 5 years ago

Those are stated to be "proposals". I believe the C++ proposal might have been accepted by the C++ committee, but I don't know if the C committee accepted the proposal yet.

The author of those proposals tweeted (really, this is the only source I have) about this proposal: https://twitter.com/jfbastien/status/989242576598327296?lang=en

Not only is the C++ standards committee happy to make signed integers two's complement in C++20 with http://wg21.link/p0907 , the C standards committee voted to do the same thing for the next version of C! 🤯 I just have to write the wording. 🎉

I don't know which parts of p0907 and N2218 will make it into C and C++, but at least some of it is on track for C++20 and the next revision of the C standard.

RalfJung commented 5 years ago

What is the problem with just saying "bool has the same size as C's _Bool, here is what this means for popular platforms"? "bool == _Bool" and "bool has size 1" are only contradicting if they are both interpreted as normative for all platforms; I do not think the latter is normative.

TBH I am a bit surprised to see so much fuzz about this, I feel there is an aspect to this discussion that I am missing.

The one possible problem with this definition is that it means we can only ever port Rust to platforms that support C; however, I think that is an entirely theoretical concern at this point.

Gankra commented 5 years ago

Defining bool == _Bool has no implications on c-less platforms, as it is implicitly "if we have a C platform to interoperate with".

I don't have much respect for the boogeyman of weird bools, since aiui that's mostly CHAR_BIT != 8 or wildly ancient platforms (win 95). We should blindly assert bool has the proposed c/cpp-20 definition, and also that we only interoperate with c platforms that support it.

I might be willing to accept a weaker statement of "we don't interoperate with c's bool on such a weird platform" but that pushes people back to spreading the folklore that c_bool is useful and good.

gnzlbg commented 5 years ago

What is the problem with just saying "bool has the same size as C's _Bool, here is what this means for popular platforms"?

All options have problems. The problem with that approach is that it doesn't guarantee that Rust bool is 1 byte, so those who might care about that might be left wondering: "Should I use u8 to store my booleans instead of bool just in case someone uses my code in some hypothetical platform in which size_of::<bool>() != 1 ? " Note that this something that affects all Rust code, instead of just Rust's C FFI.

If we would say that size_of::<bool>() == 1 then some people might decide to use c_bool in C FFI to be portable with hypothetical C platforms in which that does not hold. That looks like a lesser evil to me than leaving all Rust users, including those writing Rust code that does not interface with C, wondering about the size of bool.

SimonSapin commented 5 years ago

just in case someone uses my code in some hypothetical platform in which size_of::<bool>() != 1


fn _static_assert_size_of() {
    let _ = std::mem::transmute<bool, u8>;
}
gnzlbg commented 5 years ago

@SimonSapin preventing code from compiling is not the same producing portable code that compiles and works as expected.

An equivalent solution (that prevents some code from compiling) would be to not allow bool in C FFI in those platforms in which bool != _Bool. That would only affect Rust code doing C FFI, instead of affecting all Rust code, and bool would still continue to work properly on C FFI as long as the code only targets platforms in which bool == _Bool.

RalfJung commented 5 years ago

The problem with that approach is that it doesn't guarantee that Rust bool is 1 byte,

The lang and compiler teams decided that this is not the case (i.e., they decided that bool might have size different from 1)! It is not our place to change that decision. We are mostly documenting behavior here.

This was actually called out in https://github.com/rust-lang/rust/pull/46176#issuecomment-359593446:

We document this, and also document that on every platform we currently support, this means that the size of bool is 1.

(emphasis mine)

I see no reason for us to overthrow their decision as part of this documentation process. If you want to change that decision, feel free to write an RFC, but that IMO has no place in the current UCG discussion phase.

RalfJung commented 5 years ago

@SimonSapin Here is a version of that code that actually compiles: (from your very brief message, I had no idea whether you wanted to express that this is accepted or rejected by the compiler^^)

fn static_assert_size_of() { unsafe {
    let _ = std::mem::transmute::<bool, u8>(true);
} }

Looks like a bug to me. We also don't allow transmuting between u64 and usize on 64bit platforms. So we should reject this transmute for the same reason, or else decide that we will never support a platform where _Bool has a size different from 1, or else backtrack on https://github.com/rust-lang/rust/pull/46176#issuecomment-359593446. "We", however, is the community as part of an RFC, or maybe T-lang + T-compiler, but not the UCG group.

hanna-kruppe commented 5 years ago

We also don't allow transmuting between u64 and usize on 64bit platforms.

Huh? Of course we do. It compiles, and it's perfectly legitimate if you only target 64 bit platforms or properly cfg-gate the transmute.

RalfJung commented 5 years ago

Huh? Of course we do. It compiles, and it's perfectly legitimate if you only target 64 bit platforms or properly cfg-gate the transmute.

You are right. Seems I misremembered. Even better, things are consistent then.

RalfJung commented 5 years ago

A long discussion with @gnzlbg on Zulip uncovered why I am so confused here (in particular, I maintain that it makes no logical sense to bring up platforms with CHAR_BITS != 8 in this discussion if we declared such platforms unsupported): @gnzlbg thinks that even if we say "for FFI we assume that CHAR_SIZE == 8", we still want to say some things about FFI on platforms where CHAR_SIZE != 8.

I think we should close this issue, and just document the T-lang + T-compiler decision that bool shares size and alignment with _Bool. And then some people that care should have a discussion about which platforms we support, where "no support" means "no support". And then, if there is agreement that it is worth supporting CHAR_BITS != 8 at least a bit, we can see how that affects bool.

SimonSapin commented 5 years ago

(Code in my previous comment did not compile because I forgot :: in the turbofish syntax. But you don’t need to call transmute to get the magic size equality check, only to "instantiate" it with concrete input and output types.)

Gankra commented 5 years ago

I am still interested in knowing what concrete platforms people want to actually support with more flexible bools.

gnzlbg commented 5 years ago

I am still interested in concrete platforms that people want to actually support with more flexible bools.

Nobody came up with any relevant ones in the last discussions - I mentioned that MSVC <= 4.2 had sizeof(bool) == 4 but that is not a platform that I'd want to support.

cuviper commented 5 years ago

Historically, 32-bit powerpc-darwin was another one with sizeof(bool) == 4.

joshtriplett commented 5 years ago

Cray didn't have byte-level addressing, but I don't think anyone needs to target the Cray architecture anymore.

I think we should mandate that bool is always the platform's _Bool. And to the extent we need to, let's always assume 8-bit bytes, rather than spending time and resources future-proofing against a platform that doesn't yet exist.

gnzlbg commented 5 years ago

I think we should mandate that bool is always the platform's _Bool.

What does that buy us?

Pros of bool == _Bool:

Cons of bool == _Bool:

On one hand, I think that we should not make it impossible to write Rust that targets weird C platforms - there should not be room or excuses for people to use C. On the other hand, targeting weird C platforms is going to be hard anyways, most of the ecosystem might not be reused at least "as is" on those, etc. Is it worth it to leave most Rust users wondering about bool to make targeting these weird C platforms a slightly bit easier? I don't know. Can, e.g., -sys Rust C FFI wrappers be reused at all on those weird targets ? I don't know either.

As @rkruppe argued, just because we leave those things as "implementation defined" does not mean that people won't assume that they hold everywhere if that holds on all platforms they care about. I don't care about platforms where _Bool doesn't have the properties above, so I just can assume that they hold. The bool == _Bool definition allows that, so as long as we word everything correctly, it's a good choice IMO.

Ixrec commented 5 years ago

To clarify, c_bool would become a legitimately useful type if we ever did add support for a "weird C platform"/platform with large bools, right?

So as long as there's a fundamental conflict between guaranteeing the size on weird future platforms and guaranteeing C interop on weird future platforms, and "targeting weird C platforms is going to be hard anyways", it seems we gain a lot more from making bool predictably small for all the ordinary pure-Rust use cases and then keeping in mind that we'll have to introduce an official c_bool type if we ever add a platform where it would be useful (and we probably shouldn't do that before the "portability lint" becomes a real thing).

strega-nil commented 5 years ago

@Ixrec the team has decided that it's better to be compatible with C, than predictably 1 byte, with bit patterns exactly 0b0 and 0b1 (although there's a close to zero percent chance that this decision ever gets tested)

gnzlbg commented 5 years ago

What should miri do here? It currently implements an abstract machine where bool is one Byte wide and 0x1 and 0x0 are used for true and false but should it panic on any sort of “bool type punning” ? The “host” and the “target” can vary in behavior so this could impact const fn.

We already have similar problems with floats, but I don’t know if they are the same.

Gankra commented 5 years ago

regardless of our decision, bool's sizing is no different from usize vs u32/u64, which I don't believe is considered an "interesting" problem in miri

RalfJung commented 5 years ago

What should miri do here? It currently implements an abstract machine where bool is one Byte wide

No, it implements an abstract machine where bool is as large as it is on the target platform. Sizes and layout of types in miri is entirely determined by rustc, using the same code as what is used for codegen.

and 0x1 and 0x0 are used for true and false

That's correct. But it seems that'll soon be fixed in C/C++ as well so it seems fine.

gnzlbg commented 5 years ago

We discussed this briefly in today's meeting, so I think we can close this. We agreed on writing down something along these lines (the exact wording will be in a PR, we can discuss the nits there - cc @avadacatavra ):


Representation invariant of bool

Rust's bool has the same layout as C17's _Bool, that is, its size and alignment are implementation-defined.

note: on all platforms that Rust's currently supports, the size and alignment of bool are 1, and its ABI class is INTEGER.


Validity invariant of bool

Rust's bool has the same valid representations as C17's _Bool, that is, two valid implementation-defined bit-patterns corresponding to true and false- all other bit-patterns are invalid.

note: on all platforms that Rust's currently supports, 0x0 is the bit-pattern of false, and 0x1 is the bit-pattern of true - all other bit-patterns are invalid.

note: there are two proposals, N2218 for C and P0907 for C++, which propose defining0x0 as the bit-pattern for false and 0x1 as the bit-pattern for true in the C and C++ standards.


We didn't say anything about the safety invariant of bool, but I don't think there is much to say about that yet. It was also suggested that we should warn on unsafe code that makes assumptions about, at least, the size and alignment of bool. We might need to add a note about this, or maybe open a rust-lang/rust issue about doing that ?


There was interest in documenting the "controversy" here but there was some agreement that there isn't really much controversy about this, just different goals that are at tension.

EDIT: briefly about the goals at tension: the unsafe code guidelines answer the question "What is unsafe code allowed to assume about bool? ". From the point-of-view of answering this question, the goal would be a simple and clear answer, like: bool has size 1, 0x1 is true and 0x0 is false. Otherwise, unsafe code that assumes any of this is "wrong" in some sense, and people might end up writing more complicated unsafe code to avoid relying on these assumptions.

The T-lang and T-compiler team considered many goals, like "people should use bool in C FFI instead of c_bool", "people should use bool in Rust instead of u8", etc. The bool == _Bool decision balances all these goals. In practice, for users that write code (safe or unsafe) that aims to target "mainstream" platforms, the distinction does not really matter.