Open ghost opened 5 years ago
There are at least rules discouraging casts: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es48-avoid-casts, https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast. I believe your suggestion would fit well into the type-safety profile.
Editors call: Even though it's a C++20 features and the Guidelines don't yet support/require C++20, enough people are interested in this that we should say something. Assigning to @GabrielDosReis. Thanks!
Just for my understanding, do I understand correctly that this pull request favors bit_cast over reinterpret_cast in any possible case? Including the cases depicted by three code examples below here?
void f1(std::uintptr_t address) {
auto ptr1 = reinterpret_cast<void*>(address);
auto ptr2 = std::bit_cast<void*>(address); // Favorable?
}
void f2(std::istream& input, std::vector<std::byte>& bytes) {
input.read(reinterpret_cast<char*>(bytes.data()), bytes.size());
input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?
}
void f3(int i) {
auto byte_ptr1 = reinterpret_cast<std::byte*>(&i);
auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?
}
I'm very interested to hear if that is indeed going to be the official guideline, with respect to reinterpret_cast versus bit_cast
auto ptr2 = std::bit_cast<void*>(address); // Favorable?
This might be il-formed because sizeof(void*) == sizeof(uintptr_t)
is not necessarily true, and even if it compiles, it might give a different answer. The reinterpret_cast
does an implementation-defined conversion and does not necessarily produce a pointer value that is bitwise identical to the integer input.
input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?
It's not guaranteed that byte*
and char*
have the same representation, so this might be ill-formed too.
auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?
Ditto for int*
and byte*
.
auto ptr2 = std::bit_cast<void*>(address); // Favorable?
This might be il-formed because
sizeof(void*) == sizeof(uintptr_t)
is not necessarily true, and even if it compiles, it might give a different answer. Thereinterpret_cast
does an implementation-defined conversion and does not necessarily produce a pointer value that is bitwise identical to the integer input.
Thanks @jwakely This particular void* example is actually based on an attempt of mine to replace some C-style casts like the one from https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex doing:
hThread = (HANDLE)_beginthreadex( NULL, 0, &SecondThreadFunc, NULL, 0, &threadID );
_beginthreadex
returns a uintptr_t
, and HANDLE is just a type alias of void*
. Would you use reinterpret_cast<HANDLE>
, rather than std::bit_cast<HANDLE>
, in such a case?
input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?
It's not guaranteed that
byte*
andchar*
have the same representation, so this might be ill-formed too.
Would it be any better when doing reinterpret_cast<char*>(bytes.data())
instead?
auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?
Ditto for
int*
andbyte*
.
So again, would it be better to use reinterpret_cast here?
Your reinterpret cast examples are ok, I would not replace any of them with std::bit_cast
.
@jwakely Thanks again, Jonathan. So do you have a suggestion how the core guideline on reinterpret_cast versus bit_cast should be like? Apparently it isn't simply "Favor bit_cast over reinterpret_cast"!
Correct me if I'm wrong, but I thought using bit_cast as a 1:1 replacement for reinterpret cast ist almost always wrong.
Afaik the idea is not use bit_cast
to cast the pointer as you would with `reinterpret_cast (which is often UB, but usually works as long as alignment is correct), but to cast the value the pointer points to. E.g.
#include <iostream>
#include <bit>
struct Foo {
std::uint32_t low;
std::uint32_t high;
};
int main()
{
alginas(alignof(std::uint64_t)) Foo f{ 0x1, 0x2};
//pretty sure this is UB, but seems to be common practice
auto* p = reinterpret_cast<uint64_t*>(&f);
std::cout << *p << std::endl;
auto v = std::bit_cast<uint64_t>(f);
std::cout << v << std::endl;
}
Right. But that doesn't make much sense for Niels' examples, where he wants to read the char
values making up the object representation of some other objects. You could read each byte
as a char
using bit_cast
but that's not very useful when you want to process N bytes at a time.
Right. But that doesn't make much sense for Niels' examples,
Absolutely. My point is that bit_cast
isn't a replacement for reinterpret_cast
any more than std::unique_ptr
is a replacement for raw pointers. It can be used to avoid a specific subset of usecases for reinterpret_cast
. And any Guideline should explain that Otherwise people get confused just like we see here or the people that initially thought they should replace all their raw pointers with either unique or shared pointer.
On another, but related topic: Is my understanding correct that recent changes to the standard (don't remember the name unfortunately and whether it was c++20, c++23 or a DR) now officially permit to cast a pointer to a range of raw bytes (e.g. received over the network or mmap) to a pointer to a c++ object and dereference this pointer?
@MikeGitb This might be the paper you are thinking of: https://github.com/cplusplus/papers/issues/592
@sweemer : Thanks, but I don't think that is it.
From crossreading that paper talks about viewing the storage of an Foo
object as a sequence of char/unsigned char/std::byte, which did and was supposed to work for ages, but the actual wording in the standard didn't permit it.
What I'm talking aobut is the reverse case: I'm getting a pointer to a char/std::byte buffer that has been filled with bytes that contain a valid representation of a Foo
object and I'd like to access that buffer through a Foo*
pointer. However, as no actual c++ object has ever been created in that buffer this is UB. While this too works in practice (and has to work in order vor c- code to be compiled as c++ code), as long as alignment requirements are observed, I think there was a much bigger debate around whether that should be officielly blessed by the standard.
What I'm talking aobut is the reverse case: I'm getting a pointer to a char/std::byte buffer that has been filled with bytes that contain a valid representation of a
Foo
object and I'd like to access that buffer through aFoo*
pointer. However, as no actual c++ object has ever been created in that buffer this is UB. While this too works in practice (and has to work in order vor c- code to be compiled as c++ code), as long as alignment requirements are observed, I think there was a much bigger debate around whether that should be officielly blessed by the standard.
Given the Foo
(an implicit-lifetime type) shown in your example (https://github.com/isocpp/CppCoreGuidelines/issues/1517#issuecomment-1104282681), there can be a Foo
object created within that buffer when the buffer is created, due to implicit object creation (introduced as DR by P0593R6).
On another, but related topic: Is my understanding correct that recent changes to the standard (don't remember the name unfortunately and whether it was c++20, c++23 or a DR) now officially permit to cast a pointer to a range of raw bytes (e.g. received over the network or mmap) to a pointer to a c++ object and dereference this pointer?
I guess this usage sometimes requires std::launder
after reinterpret_cast
(see P0137R1) currently, although AFAIK no compiler thinks std::launder
is in effect in this case.
Given the Foo (an implicit-lifetime type) shown in your example (https://github.com/isocpp/CppCoreGuidelines/issues/1517#issuecomment-1104282681), there can be a Foo object created within that buffer when the buffer is created, due to implicit object creation (introduced as DR by P0593R6).
Thats the paper I had in mind. Thank you very much.
@jwakely Can you please confirm that the line of code that does a reinterpret_cast
in the following example has undefined behavior?
void f( uint32_t u ) {
int i1 = *reinterpret_cast< int32_t * >(&u); // Undefined behavior, right?
int i2 = std::bit_cast< int32_t>(u); // Proper bit_cast use case?
}
Assuming that a bitwise conversion is intended, I guess that would be a proper use case for bit_cast
. Right? I just proposed this to the ITK library, pull request https://github.com/InsightSoftwareConsortium/ITK/pull/3402
The reinterpret_cast itself is ok, but dereferencing the result of the cast is undefined.
The bit_cast version is ok, but redundant. For two's complement integers the implicit conversion from uint32_t
to int32_t
preserves the original bit pattern, and C++ now requires two's complement integers (and all known implementations already use them anyway).
For two's complement integers the implicit conversion from
uint32_t
toint32_t
preserves the original bit pattern, and C++ now requires two's complement integers
@jwakely Ah, thanks Jonathan. I see now that with p0907R2 - Signed Integers are Two’s Complement, in case of out of range, the result is no longer implementation-defined. That's also new to me.
So in this particular case (from uint32_t
to int32_t
), one might say "Favor bit_cast over reinterpret_cast", but then both implicit conversion and static_cast<int32_t>
would still be more favorable than bit_cast. 🤔
With respect to my ITK pull request , I'm not entirely sure if all compilers supported by ITK already implement integer conversion exactly like that (without any disagreeing implementation-defined behavior), as ITK still supports C++14 compilation by GCC 5.1, LLVM Clang 3.4, and VS2017 (v141). So I think ITK might still use its own hand-written bit_cast
replica for a while. But of course, that's beyond this CppCoreGuidelines issue.
GCC, clang and msvc always implemented that rule on all supported architectures. That's why changing the standard was completely uncontroversial. It used to be implementation defined to allow for other representations, but no C++ compiler ever used anything except two's complement.
@N-Dekker I think *reinterpret_cast< int32_t * >(&u)
is well-defined according to [basic.lval]/(11.2):
a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
But I also suggest you to use static_cast
in this case.
@N-Dekker @jwakely I agree with @frederick-vs-ja that the int32_t i1 = *reinterpret_cast< int32_t * >(&u);
(I changed the int
to int32_t
to eliminate the assignment undefined behavior (caused by integer overflow) which is irrelevant to this topic.) behavior should be well-defined. However, if it were float i1 = *reinterpret_cast< float * >(&u);
, then the behavior is undefined.
std::bit_cast<>()
ability to silently discard const
-ness seems excessively dangerous to make this recommendation.
Surely the suggestion is to use bit_cast
to access the object itself and copy the bytes into a new object of a different type. I don't think anybody is suggesting using bit_cast
to convert T to U and then cause undefined behaviour by accessing a T through a U*. That's just as undefined as doing the same with reinterpret_cast.
The point is to stop doing that kind of thing, and use bit_cast to do something better. Not just use bit_cast as a different spelling for the same old undefined badness.
Surely the suggestion is to use
bit_cast
to access the object itself and copy the bytes into a new object of a different type.
Yes, for that use case I completely agree in the approach. I just do not want to see reinterpret_cast<T*>
mechanically replaced with std::bit_cast<T*>
as that can compile, but is dangerous.
thought Should a separate guideline discourage use of std::bit_cast<>
between two pointer types?
Now that
std::bit_cast
is coming in C++20, the valid use cases forreinterpret_cast
are slim to none in the majority of applications. I believe there should be a rule heavily discouragingreinterpret_cast
in favor ofstd::bit_cast
or another named cast.Often times switching from
reinterpret_cast
tostd::bit_cast
would be the difference between invoking undefined behavior or not, due to type aliasing rules.