max0x7ba / atomic_queue

C++ lockless queue.
MIT License
1.47k stars 176 forks source link

build failure with gcc-13 #55

Closed emollier closed 1 year ago

emollier commented 1 year ago

atomic_queue fails to build from source with Gcc 13 on today's Debian sid. The relevant part of the error message starts like:

In file included from /usr/include/c++/13/bits/atomic_base.h:39,
                 from /usr/include/c++/13/atomic:41,
                 from include/atomic_queue/defs.h:7,
                 from include/atomic_queue/atomic_queue.h:7,
                 from /<<PKGBUILDDIR>>/src/benchmarks.cc:5:
In function ‘std::_Require<std::__not_<std::__is_tuple_like<_Tp> >, std::is_move_constructible<_Tp>, std::is_move_assignable<_Tp> > std::swap(_Tp&, _Tp&) [with _Tp = thread::id]’,
    inlined from ‘void std::thread::swap(std::thread&)’ at /usr/include/c++/13/bits/std_thread.h:193:16,
    inlined from ‘std::thread& std::thread::operator=(std::thread&&)’ at /usr/include/c++/13/bits/std_thread.h:187:11,
    inlined from ‘{anonymous}::cycles_t {anonymous}::benchmark_throughput(atomic_queue::HugePages&, const std::vector<unsigned int>&, unsigned int, unsigned int, bool, sum_t*) [with Queue = atomic_queue::RetryDecorator<CapacityToConstructor<atomic_queue::AtomicQueueB<unsigned int, atomic_queue::HugePageAllocator<unsigned int>, 0, false, false, true>, 65536> >]’ at /<<PKGBUILDDIR>>/src/benchmarks.cc:252:43,
    inlined from ‘void {anonymous}::run_throughput_benchmark(const char*, atomic_queue::HugePages&, const std::vector<unsigned int>&, unsigned int, unsigned int, unsigned int) [with Queue = atomic_queue::RetryDecorator<CapacityToConstructor<atomic_queue::AtomicQueueB<unsigned int, atomic_queue::HugePageAllocator<unsigned int>, 0, false, false, true>, 65536> >]’ at /<<PKGBUILDDIR>>/src/benchmarks.cc:292:60:
/usr/include/c++/13/bits/move.h:198:7: error: array subscript 2 is outside array bounds of ‘long long int [1]’ [-Werror=array-bounds=]
  198 |       __a = _GLIBCXX_MOVE(__b);
      |       ^~~
In file included from /usr/include/x86_64-linux-gnu/c++/13/bits/c++allocator.h:33,
                 from /usr/include/c++/13/bits/allocator.h:46,
                 from /usr/include/c++/13/memory:65,
                 from include/atomic_queue/atomic_queue.h:13:
In member function ‘_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = long long int]’,
    inlined from ‘static _Tp* std::allocator_traits<std::allocator<_Tp1> >::allocate(allocator_type&, size_type) [with _Tp = long long int]’ at /usr/include/c++/13/bits/alloc_traits.h:482:28,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = long long int; _Alloc = std::allocator<long long int>]’ at /usr/include/c++/13/bits/stl_vector.h:378:33,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:375:7,
    inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_create_storage(std::size_t) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:395:44,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::_Vector_base(std::size_t, const allocator_type&) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:332:26,
    inlined from ‘std::vector<_Tp, _Alloc>::vector(size_type, const allocator_type&) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:554:47,
    inlined from ‘{anonymous}::cycles_t {anonymous}::benchmark_throughput(atomic_queue::HugePages&, const std::vector<unsigned int>&, unsigned int, unsigned int, bool, sum_t*) [with Queue = atomic_queue::RetryDecorator<CapacityToConstructor<atomic_queue::AtomicQueueB<unsigned int, atomic_queue::HugePageAllocator<unsigned int>, 0, false, false, true>, 65536> >]’ at /<<PKGBUILDDIR>>/src/benchmarks.cc:245:30,
    inlined from ‘void {anonymous}::run_throughput_benchmark(const char*, atomic_queue::HugePages&, const std::vector<unsigned int>&, unsigned int, unsigned int, unsigned int) [with Queue = atomic_queue::RetryDecorator<CapacityToConstructor<atomic_queue::AtomicQueueB<unsigned int, atomic_queue::HugePageAllocator<unsigned int>, 0, false, false, true>, 65536> >]’ at /<<PKGBUILDDIR>>/src/benchmarks.cc:292:60:
/usr/include/c++/13/bits/new_allocator.h:147:55: note: at offset 16 into object of size 8 allocated by ‘operator new’
  147 |         return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
      |                                                       ^

This issue has initially been reported in Debian bug#1037715. I have been unable to identify what exactly in the porting guide to gcc-13 was relevant to the error at play.

emollier commented 1 year ago

I forgot to mention, the Debian bug is open against last year's version of atomic_queue, but I tested the build against the git head as of today (b770bb2) as well.

musicinmybrain commented 1 year ago

Just as another data point, it’s still working for me on Fedora Rawhide with atomic_queue 1.4 and our patched gcc 13.1.1.

emollier commented 1 year ago

Thanks for checking, this sounds like that could be caused by an issue in gcc 13.1.0 in Debian. Tag v1.4 still fails in the same way as the git head to me. I'll pursue investigations in that direction.

max0x7ba commented 1 year ago

This issue has initially been reported in Debian bug#1037715. I have been unable to identify what exactly in the porting guide to gcc-13 was relevant to the error at play.

The error message originates from std::thread& std::thread::operator=(std::thread&&) move assignment operator invoked from benchmark_throughput.

The operator invokes void std::thread::swap(std::thread&), which then invokes std::swap(_Tp&, _Tp&) [with _Tp = thread::id] which fails. One of these thread::id& parameters of std::swap<thread::id> gets thread::id&& argument, a member of the std::thread&& r-value object. That strongly hints at gcc-13 Implicit move rules change breaking std::thread& std::thread::operator=(std::thread&&). I wonder if gcc unit tests exercise this operator.

The minimal reproduction code for the bug should be:

#include <thread>
#include <vector>

void g(int);

void f() {
    std::vector<std::thread> threads(1);
    threads[0] = std::thread(g, 42); // Compiler error comes from this statement.
    threads[0].join();
}

I am unable to reproduce this bug with compilers available to me so far: https://godbolt.org/z/46Y3GGxns

Does this snippet reproduce the bug with your compiler? @emollier

emollier commented 1 year ago

Thanks for the snippet, I gave it a try with:

$ g++ --version
g++ (Debian 13.1.0-8) 13.1.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

but I'm afraid I didn't reproduce any build failure.

max0x7ba commented 1 year ago

Thanks for the snippet, I gave it a try with:

$ g++ --version
g++ (Debian 13.1.0-8) 13.1.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

but I'm afraid I didn't reproduce any build failure.

Was it compiled with -pthread -std=gnu++14 -O3 -Wall -Wextra -Werror compiler options?

emollier commented 1 year ago

Ahah, silly me, no there was just -O0 to avoid optimizing out the function. Yes, the issue is reproducible with your compiler flags:

$ g++ -pthread -std=gnu++14 -O3 -Wall -Wextra -Werror -o repro repro.cpp
In file included from /usr/include/c++/13/bits/exception_ptr.h:41,
                 from /usr/include/c++/13/exception:164,
                 from /usr/include/c++/13/ios:41,
                 from /usr/include/c++/13/ostream:40,
                 from /usr/include/c++/13/iostream:41,
                 from repro.cpp:1:
In function ‘std::_Require<std::__not_<std::__is_tuple_like<_Tp> >, std::is_move_constructible<_Tp>, std::is_move_assignable<_Tp> > std::swap(_Tp&, _Tp&) [with _Tp = thread::id]’,
    inlined from ‘void std::thread::swap(std::thread&)’ at /usr/include/c++/13/bits/std_thread.h:193:16,
    inlined from ‘std::thread& std::thread::operator=(std::thread&&)’ at /usr/include/c++/13/bits/std_thread.h:187:11,
    inlined from ‘int main()’ at repro.cpp:14:34:
/usr/include/c++/13/bits/move.h:198:7: error: array subscript 1 is outside array bounds of ‘std::thread [1]’ [-Werror=array-bounds=]
  198 |       __a = _GLIBCXX_MOVE(__b);
      |       ^~~
In file included from /usr/include/x86_64-linux-gnu/c++/13/bits/c++allocator.h:33,
                 from /usr/include/c++/13/bits/allocator.h:46,
                 from /usr/include/c++/13/string:43,
                 from /usr/include/c++/13/bits/locale_classes.h:40,
                 from /usr/include/c++/13/bits/ios_base.h:41,
                 from /usr/include/c++/13/ios:44:
In member function ‘_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = std::thread]’,
    inlined from ‘static _Tp* std::allocator_traits<std::allocator<_CharT> >::allocate(allocator_type&, size_type) [with _Tp = std::thread]’ at /usr/include/c++/13/bits/alloc_traits.h:482:28,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:378:33,
    inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_create_storage(std::size_t) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:395:44,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::_Vector_base(std::size_t, const allocator_type&) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:332:26,
    inlined from ‘std::vector<_Tp, _Alloc>::vector(size_type, const allocator_type&) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:554:47,
    inlined from ‘int main()’ at repro.cpp:12:39:
/usr/include/c++/13/bits/new_allocator.h:147:55: note: at offset 8 into object of size 8 allocated by ‘operator new’
  147 |         return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
      |                                                       ^
cc1plus: all warnings being treated as errors

Thank you very much!

emollier commented 1 year ago

For ulterior reference, the exact reproducer is:

$ cat repro.cpp
#include <iostream>
#include <thread>
#include <vector>

void g(int n);

void g(int n) {
    std::cout << "consume: " << n << std::endl;
}

int main(void) {
    std::vector<std::thread> threads(1);
    // The compiler error comes from the two threads below.
    threads[0] = std::thread(g, 42);
    threads[1] = std::thread(g, 42);
    threads[0].join();
    threads[1].join();
    return (0);
}
$ g++ repro.cpp -o repro -Wall -Werror -O3

Optimization subtleties seem to make the issue disappear when there is a single thread.

emollier commented 1 year ago

Oh wait, I forgot to bump the thread count initialization. This doesn't raise the issue:

$ cat repro.cpp 
#include <iostream>
#include <thread>
#include <vector>

void g(int);

void g(int n) {
    std::cout << "consume: " << n << std::endl;
}

int main(void) {
    std::vector<std::thread> threads(2);
    // The compiler error comes from the two threads below.
    threads[0] = std::thread(g, 42);
    threads[1] = std::thread(g, 42);
    threads[0].join();
    threads[1].join();
    return (0);
}
$ g++ repro.cpp -o repro -Wall -Wextra -Werror -O3 -std=gnu++14 -pthread
$ echo $?
0

Sounds like we might be after an initialization issue of the thread count?

emollier commented 1 year ago

Alright, so, I'm not sure whether this is correct or not, but the build problem disappears for me when I apply the following change:

--- libatomic-queue.orig/src/benchmarks.cc
+++ libatomic-queue/src/benchmarks.cc
@@ -242,7 +242,7 @@
     std::atomic<unsigned> last_consumer{thread_count};

     Barrier barrier;
-    std::vector<std::thread> threads(thread_count * 2 - 1);
+    std::vector<std::thread> threads(thread_count * 2 + 1);
     if(alternative_placement) {
         for(unsigned i = 0; i < thread_count; ++i) {
             set_default_thread_affinity(hw_thread_ids[cpu_idx++]);
max0x7ba commented 1 year ago

Thinking more about it, the root cause of the bug lies in std::thread& std::thread::operator=(std::thread&&), which shouldn't require using std::vector to reproduce it. May be the following would suffice:

#include <thread>

void g(int);

__attribute__((noinline)) std::thread& f(std::thread& a) {
    return a = std::thread(g, 42); // Compiler error comes from this statement.
}

It should be compiled as .o without main function calling the function to avoid whole program optimization, and should reproduce with just -pthread -std=gnu++14 compiler options, no specific optimization levels should be necessary.

emollier commented 1 year ago

This one builds fine as far as I could witness:

$ cat repro.cpp 
#include <thread>

void g(int);

__attribute__((noinline)) std::thread& f(std::thread& a) {
    return a = std::thread(g, 42); // Compiler error comes from this statement.
}
$ g++ repro.cpp -c -o repro.o -pthread -std=gnu++14
$ g++ repro.cpp -c -o repro.o -pthread -std=gnu++14 -Wall -Wextra -Werror
$ g++ repro.cpp -c -o repro.o -pthread -std=gnu++14 -O3 -Wall -Wextra -Werror
max0x7ba commented 1 year ago

This one builds fine as far as I could witness:

Well, I am out of my depth here, and have no access to your compiler to play around with it.

emollier commented 1 year ago

No worries, I'm thankful for the time you already spent on the issue. :)

I have enough data to move the Debian issue forwards (either by getting the system compiler fixed, dropping optimization level a little for the affected code, or at worse dusting off the build error under the carpet with the patch).

I'll nudge Gcc maintainers to see whether a bump to 13.1.1 would be feasible, even if this is just to the experimental branch. Maybe we'll see whether this comes from the older compiler version.

max0x7ba commented 1 year ago

I'll nudge Gcc maintainers to see whether a bump to 13.1.1 would be feasible, even if this is just to the experimental branch. Maybe we'll see whether this comes from the older compiler version.

The compiler error message is pretty clear and unambiguous - std::thread& std::thread::operator=(std::thread&&) fails on swapping thread ids.

That gcc-13 "simpler implicit move" breaking change sounds like it only applies when building in c++23 mode, whereas this code is compiled in gnu++14 mode, which shouldn't be affected.

max0x7ba commented 1 year ago

Alright, so, I'm not sure whether this is correct or not, but the build problem disappears for me when I apply the following change:

--- libatomic-queue.orig/src/benchmarks.cc
+++ libatomic-queue/src/benchmarks.cc
@@ -242,7 +242,7 @@
     std::atomic<unsigned> last_consumer{thread_count};

     Barrier barrier;
-    std::vector<std::thread> threads(thread_count * 2 - 1);
+    std::vector<std::thread> threads(thread_count * 2 + 1);
     if(alternative_placement) {
         for(unsigned i = 0; i < thread_count; ++i) {
             set_default_thread_affinity(hw_thread_ids[cpu_idx++]);

It does -1 because it re-uses the current thread instead of creating another one. This code is well-formed, have been subjected to valgrind, and all available sanitizers in clang and gcc. clang thread sanitizer did uncover a couple of places with memory orders not being strong enough, which I was quite impressed with, prior to version 1.0 release. Github continuous integration runs the unit tests and sanitizers with gcc and clang on each commit. gcc should do the same :).

emollier commented 1 year ago

I'll nudge Gcc maintainers to see whether a bump to 13.1.1 would be feasible, even if this is just to the experimental branch. Maybe we'll see whether this comes from the older compiler version.

This is Debian bug#1041607.

jwakely commented 1 year ago

The compiler error message is

N.B. this is a warning not an error. You've chosen to turn it into an error with -Werror, but most warnings are imperfect and have false positives.

The -Warray-bounds warning (like the similar -Wstringop-overflow and -Wstringop-overread) is particularly prone to false positives when the compiler is unable to tell that a particular branch is unreachable and so warns about impossible conditions.

If you get false positives with particular warnings, don't use -Werror (or use -Wno-error=array-bounds), or suppress it locally, e.g.

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Warray-bounds"
    threads[0] = std::thread(g, 42);
    threads[1] = std::thread(g, 42);
#pragma GCC diagnostic pop

I don't think this warning was introduced by the simpler implicit move changes, I can reproduce it with anything later than GCC 12.1, the warning started with https://gcc.gnu.org/r12-1992

jwakely commented 1 year ago

This is Debian bug#1041607.

The issue title is very misleading, swapping thread IDs doesn't "fail", it gives a warning. Warnings have false positives, but just because you choose to turn that into a build failure with -Werror doesn't mean that swapping thread IDs doesn't work correctly.

jwakely commented 1 year ago

Filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110764

emollier commented 1 year ago

Hi jwakely, thanks for jumping in, your comments on the bug title and forwarding the issue to Gcc developers! :)

Issue retitle is on its way, and I intend to rebuild atomic_queue with -Wno-error=array-bounds for the time being, until resolution on gcc end.

max0x7ba commented 1 year ago

This is Debian bug#1041607.

The issue title is very misleading, swapping thread IDs doesn't "fail", it gives a warning. Warnings have false positives, but just because you choose to turn that into a build failure with -Werror doesn't mean that swapping thread IDs doesn't work correctly.

You are quite right, Jon, I stand corrected and defer to you.