tdlib / td

Cross-platform library for building Telegram clients
https://core.telegram.org/tdlib
Boost Software License 1.0
7.11k stars 1.44k forks source link

Would this piece of code cause alignment problem? #2694

Closed Yxue-1906 closed 11 months ago

Yxue-1906 commented 11 months ago

https://github.com/tdlib/td/blob/07c1d53a6d3cb1fad58d2822e55eef6d57363581/tdutils/td/utils/buffer.cpp#L105-L115

cppreference' page about reinterpret_cast conversion note that

Any object pointer type T1 can be converted to another object pointer type cv T2. This is exactly equivalent to static_cast<cv T2>(static_cast<cv void>(expression)) (which implies that if T2's alignment requirement is not stricter than T1's, the value of the pointer does not change and conversion of the resulting pointer back to its original type yields the original value). In any case, the resulting pointer may only be dereferenced safely if allowed by the type aliasing rules.

about type aliasing rules,

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

  • AliasedType and DynamicType are similar
  • AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
  • AliasedType is std::byte,(since C++17) char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

Informally, two types are similar if, ignoring top-level cv-qualification:

  • they are the same type; or
  • they are both pointers, and the pointed-to types are similar; or
  • they are both pointers to member of the same class, and the types of the pointed-to members are similar; or
  • they are both arrays of the same size or both arrays of unknown bound, and the array element types are similar. (until C++20)
  • they are both arrays of the same size or at least one of them is array of unknown bound, and the array element types are similar. (since C++20)

So, if I do not misunderstand correspond contents, the code I quoted should be some kind of UB?

edit: on my platform, alignof(BufferRaw) == 8

levlam commented 11 months ago

The first quote is relevant to the question. reinterpret_cast for the pointer itself is valid "if T2's alignment requirement is not stricter than T1's". The operator new "is required to return a pointer suitably aligned to point to an object of the requested size". The requested size is at least sizeof(BufferRaw) and BufferRaw has standard alignment requirements, so the cast is valid.

After the pointer is casted placement new is called to create BufferRaw object in the allocated buffer

new (buffer_raw) BufferRaw(size);

This creates and initializes a BufferRaw object with dynamic storage duration, which can be used thereafter, and returns a pointer to it. There is no type aliasing involved.

Yxue-1906 commented 11 months ago

thx for the explanation, but I still have some questions about your words.

though standard guarantee operator new return a pointer pointing to memory block which is suitably aligned and huge enough, is the alignment requirement be alignof(BufferRaw) ?

I mean, given this example:

constexpr const size_t size = sizeof(BufferRaw);

// code 1
auto char_array_ptr = new char[size];
auto * buffer_ptr1 = reinterpret_cast<BufferRaw *>(char_array_ptr);

// code 2
auto * buffer_ptr2 = reinterpret_cast<BufferRaw *>(new char[size]);

in code 1, the alignment requirement for char_array_ptr should be alignof(char[size]), which is 1 on my platform; and, in my opinion, code 1 is identical to code 2, so still can not guarantee buffer_ptr1/buffer_ptr2 pointing to memory suitably aligned for BufferRaw but for char[size].

levlam commented 11 months ago

in code 1, the alignment requirement for char_array_ptr should be alignof(char[size])

No, this is not true. The code is new expression, which just calls the function operator new[], passing to it just the size of the requested object. Alignment of char is irrelevant.

See https://en.cppreference.com/w/cpp/language/new for mode details. For example,

Array allocation may supply unspecified overhead, which may vary from one call to new to the next, unless the allocation function selected is the standard non-allocating form. The pointer returned by the new-expression will be offset by that value from the pointer returned by the allocation function. Many implementations use the array overhead to store the number of objects in the array which is used by the delete[] expression to call the correct number of destructors. In addition, if the new-expression is used to allocate an array of char, unsigned char, or std::byte(since C++17), it may request additional memory from the allocation function if necessary to guarantee correct alignment of objects of all types no larger than the requested array size, if one is later placed into the allocated array.

Yxue-1906 commented 11 months ago

if the new-expression is used to allocate an array of char, unsigned char, or std::byte (since C++17), it may request additional memory from the allocation function if necessary to guarantee correct alignment of objects of all types no larger than the requested array size, if one is later placed into the allocated array.

Wow, I understand it completely now. Thx for your excellent explanation and kind help!

Yxue-1906 commented 11 months ago

https://github.com/tdlib/td/blob/07c1d53a6d3cb1fad58d2822e55eef6d57363581/tdutils/td/utils/buffer.h#L37 I have another question about BufferRaw: what is the purpose add alignas(4) to data_?

levlam commented 11 months ago

We want all data stored in buffers to be sufficiently aligned, because this can significantly speed up access to it by blocks of 4-8 byte.