isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
Other
42.92k stars 5.44k forks source link

Favor bit_cast over reinterpret_cast #1517

Open ghost opened 5 years ago

ghost commented 5 years ago

Now that std::bit_cast is coming in C++20, the valid use cases for reinterpret_cast are slim to none in the majority of applications. I believe there should be a rule heavily discouraging reinterpret_cast in favor of std::bit_cast or another named cast.

Often times switching from reinterpret_cast to std::bit_cast would be the difference between invoking undefined behavior or not, due to type aliasing rules.

beinhaerter commented 5 years ago

There are at least rules discouraging casts: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es48-avoid-casts, https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast. I believe your suggestion would fit well into the type-safety profile.

hsutter commented 4 years ago

Editors call: Even though it's a C++20 features and the Guidelines don't yet support/require C++20, enough people are interested in this that we should say something. Assigning to @GabrielDosReis. Thanks!

N-Dekker commented 2 years ago

Just for my understanding, do I understand correctly that this pull request favors bit_cast over reinterpret_cast in any possible case? Including the cases depicted by three code examples below here?

void f1(std::uintptr_t address) {
  auto ptr1 = reinterpret_cast<void*>(address);
  auto ptr2 = std::bit_cast<void*>(address); // Favorable?
}

void f2(std::istream& input, std::vector<std::byte>& bytes) {
  input.read(reinterpret_cast<char*>(bytes.data()), bytes.size());
  input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?
}

void f3(int i) {
  auto byte_ptr1 = reinterpret_cast<std::byte*>(&i);
  auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?
}

I'm very interested to hear if that is indeed going to be the official guideline, with respect to reinterpret_cast versus bit_cast

jwakely commented 2 years ago

auto ptr2 = std::bit_cast<void*>(address); // Favorable?

This might be il-formed because sizeof(void*) == sizeof(uintptr_t) is not necessarily true, and even if it compiles, it might give a different answer. The reinterpret_cast does an implementation-defined conversion and does not necessarily produce a pointer value that is bitwise identical to the integer input.

input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?

It's not guaranteed that byte* and char* have the same representation, so this might be ill-formed too.

auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?

Ditto for int* and byte*.

N-Dekker commented 2 years ago

auto ptr2 = std::bit_cast<void*>(address); // Favorable?

This might be il-formed because sizeof(void*) == sizeof(uintptr_t) is not necessarily true, and even if it compiles, it might give a different answer. The reinterpret_cast does an implementation-defined conversion and does not necessarily produce a pointer value that is bitwise identical to the integer input.

Thanks @jwakely This particular void* example is actually based on an attempt of mine to replace some C-style casts like the one from https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex doing:

hThread = (HANDLE)_beginthreadex( NULL, 0, &SecondThreadFunc, NULL, 0, &threadID );

_beginthreadex returns a uintptr_t, and HANDLE is just a type alias of void*. Would you use reinterpret_cast<HANDLE>, rather than std::bit_cast<HANDLE>, in such a case?

input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?

It's not guaranteed that byte* and char* have the same representation, so this might be ill-formed too.

Would it be any better when doing reinterpret_cast<char*>(bytes.data()) instead?

auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?

Ditto for int* and byte*.

So again, would it be better to use reinterpret_cast here?

jwakely commented 2 years ago

Your reinterpret cast examples are ok, I would not replace any of them with std::bit_cast.

N-Dekker commented 2 years ago

@jwakely Thanks again, Jonathan. So do you have a suggestion how the core guideline on reinterpret_cast versus bit_cast should be like? Apparently it isn't simply "Favor bit_cast over reinterpret_cast"!

MikeGitb commented 2 years ago

Correct me if I'm wrong, but I thought using bit_cast as a 1:1 replacement for reinterpret cast ist almost always wrong. Afaik the idea is not use bit_cast to cast the pointer as you would with `reinterpret_cast (which is often UB, but usually works as long as alignment is correct), but to cast the value the pointer points to. E.g.

#include <iostream>
#include <bit>

struct Foo {
    std::uint32_t low;
    std::uint32_t high;       
};

int main()
{
    alginas(alignof(std::uint64_t)) Foo f{ 0x1, 0x2};

    //pretty sure this is UB, but seems to be common practice
    auto* p = reinterpret_cast<uint64_t*>(&f);
    std::cout << *p << std::endl;  

    auto v = std::bit_cast<uint64_t>(f);
    std::cout << v << std::endl;
}
jwakely commented 2 years ago

Right. But that doesn't make much sense for Niels' examples, where he wants to read the char values making up the object representation of some other objects. You could read each byte as a char using bit_cast but that's not very useful when you want to process N bytes at a time.

MikeGitb commented 2 years ago

Right. But that doesn't make much sense for Niels' examples,

Absolutely. My point is that bit_cast isn't a replacement for reinterpret_cast any more than std::unique_ptr is a replacement for raw pointers. It can be used to avoid a specific subset of usecases for reinterpret_cast. And any Guideline should explain that Otherwise people get confused just like we see here or the people that initially thought they should replace all their raw pointers with either unique or shared pointer.

MikeGitb commented 2 years ago

On another, but related topic: Is my understanding correct that recent changes to the standard (don't remember the name unfortunately and whether it was c++20, c++23 or a DR) now officially permit to cast a pointer to a range of raw bytes (e.g. received over the network or mmap) to a pointer to a c++ object and dereference this pointer?

sweemer commented 2 years ago

@MikeGitb This might be the paper you are thinking of: https://github.com/cplusplus/papers/issues/592

MikeGitb commented 2 years ago

@sweemer : Thanks, but I don't think that is it.

From crossreading that paper talks about viewing the storage of an Foo object as a sequence of char/unsigned char/std::byte, which did and was supposed to work for ages, but the actual wording in the standard didn't permit it.

What I'm talking aobut is the reverse case: I'm getting a pointer to a char/std::byte buffer that has been filled with bytes that contain a valid representation of a Foo object and I'd like to access that buffer through a Foo* pointer. However, as no actual c++ object has ever been created in that buffer this is UB. While this too works in practice (and has to work in order vor c- code to be compiled as c++ code), as long as alignment requirements are observed, I think there was a much bigger debate around whether that should be officielly blessed by the standard.

frederick-vs-ja commented 2 years ago

What I'm talking aobut is the reverse case: I'm getting a pointer to a char/std::byte buffer that has been filled with bytes that contain a valid representation of a Foo object and I'd like to access that buffer through a Foo* pointer. However, as no actual c++ object has ever been created in that buffer this is UB. While this too works in practice (and has to work in order vor c- code to be compiled as c++ code), as long as alignment requirements are observed, I think there was a much bigger debate around whether that should be officielly blessed by the standard.

Given the Foo (an implicit-lifetime type) shown in your example (https://github.com/isocpp/CppCoreGuidelines/issues/1517#issuecomment-1104282681), there can be a Foo object created within that buffer when the buffer is created, due to implicit object creation (introduced as DR by P0593R6).

frederick-vs-ja commented 2 years ago

On another, but related topic: Is my understanding correct that recent changes to the standard (don't remember the name unfortunately and whether it was c++20, c++23 or a DR) now officially permit to cast a pointer to a range of raw bytes (e.g. received over the network or mmap) to a pointer to a c++ object and dereference this pointer?

I guess this usage sometimes requires std::launder after reinterpret_cast (see P0137R1) currently, although AFAIK no compiler thinks std::launder is in effect in this case.

MikeGitb commented 2 years ago

Given the Foo (an implicit-lifetime type) shown in your example (https://github.com/isocpp/CppCoreGuidelines/issues/1517#issuecomment-1104282681), there can be a Foo object created within that buffer when the buffer is created, due to implicit object creation (introduced as DR by P0593R6).

Thats the paper I had in mind. Thank you very much.

N-Dekker commented 2 years ago

@jwakely Can you please confirm that the line of code that does a reinterpret_cast in the following example has undefined behavior?

void f( uint32_t u ) {
  int i1 = *reinterpret_cast< int32_t * >(&u); // Undefined behavior, right?
  int i2 = std::bit_cast< int32_t>(u); // Proper bit_cast use case?
}

Assuming that a bitwise conversion is intended, I guess that would be a proper use case for bit_cast. Right? I just proposed this to the ITK library, pull request https://github.com/InsightSoftwareConsortium/ITK/pull/3402

jwakely commented 2 years ago

The reinterpret_cast itself is ok, but dereferencing the result of the cast is undefined.

The bit_cast version is ok, but redundant. For two's complement integers the implicit conversion from uint32_t to int32_t preserves the original bit pattern, and C++ now requires two's complement integers (and all known implementations already use them anyway).

N-Dekker commented 2 years ago

For two's complement integers the implicit conversion from uint32_t to int32_t preserves the original bit pattern, and C++ now requires two's complement integers

@jwakely Ah, thanks Jonathan. I see now that with p0907R2 - Signed Integers are Two’s Complement, in case of out of range, the result is no longer implementation-defined. That's also new to me.

So in this particular case (from uint32_t to int32_t), one might say "Favor bit_cast over reinterpret_cast", but then both implicit conversion and static_cast<int32_t> would still be more favorable than bit_cast. 🤔

With respect to my ITK pull request , I'm not entirely sure if all compilers supported by ITK already implement integer conversion exactly like that (without any disagreeing implementation-defined behavior), as ITK still supports C++14 compilation by GCC 5.1, LLVM Clang 3.4, and VS2017 (v141). So I think ITK might still use its own hand-written bit_cast replica for a while. But of course, that's beyond this CppCoreGuidelines issue.

jwakely commented 2 years ago

GCC, clang and msvc always implemented that rule on all supported architectures. That's why changing the standard was completely uncontroversial. It used to be implementation defined to allow for other representations, but no C++ compiler ever used anything except two's complement.

frederick-vs-ja commented 2 years ago

@N-Dekker I think *reinterpret_cast< int32_t * >(&u) is well-defined according to [basic.lval]/(11.2):

a type that is the signed or unsigned type corresponding to the dynamic type of the object, or

But I also suggest you to use static_cast in this case.

leimao commented 1 year ago

@N-Dekker @jwakely I agree with @frederick-vs-ja that the int32_t i1 = *reinterpret_cast< int32_t * >(&u); (I changed the int to int32_t to eliminate the assignment undefined behavior (caused by integer overflow) which is irrelevant to this topic.) behavior should be well-defined. However, if it were float i1 = *reinterpret_cast< float * >(&u);, then the behavior is undefined.

bencodestx commented 3 weeks ago

std::bit_cast<>() ability to silently discard const-ness seems excessively dangerous to make this recommendation.

https://godbolt.org/z/exhvcr17o

jwakely commented 3 weeks ago

Surely the suggestion is to use bit_cast to access the object itself and copy the bytes into a new object of a different type. I don't think anybody is suggesting using bit_cast to convert T to U and then cause undefined behaviour by accessing a T through a U*. That's just as undefined as doing the same with reinterpret_cast.

The point is to stop doing that kind of thing, and use bit_cast to do something better. Not just use bit_cast as a different spelling for the same old undefined badness.

bencodestx commented 3 weeks ago

Surely the suggestion is to use bit_cast to access the object itself and copy the bytes into a new object of a different type.

Yes, for that use case I completely agree in the approach. I just do not want to see reinterpret_cast<T*> mechanically replaced with std::bit_cast<T*> as that can compile, but is dangerous.

bencodestx commented 3 weeks ago

thought Should a separate guideline discourage use of std::bit_cast<> between two pointer types?