Closed emollier closed 1 year ago
I forgot to mention, the Debian bug is open against last year's version of atomic_queue, but I tested the build against the git head as of today (b770bb2) as well.
Just as another data point, it’s still working for me on Fedora Rawhide with atomic_queue
1.4 and our patched gcc 13.1.1.
Thanks for checking, this sounds like that could be caused by an issue in gcc 13.1.0 in Debian. Tag v1.4 still fails in the same way as the git head to me. I'll pursue investigations in that direction.
This issue has initially been reported in Debian bug#1037715. I have been unable to identify what exactly in the porting guide to gcc-13 was relevant to the error at play.
The error message originates from std::thread& std::thread::operator=(std::thread&&)
move assignment operator invoked from benchmark_throughput
.
The operator invokes void std::thread::swap(std::thread&)
, which then invokes std::swap(_Tp&, _Tp&) [with _Tp = thread::id]
which fails. One of these thread::id&
parameters of std::swap<thread::id>
gets thread::id&&
argument, a member of the std::thread&&
r-value object. That strongly hints at gcc-13 Implicit move rules change breaking std::thread& std::thread::operator=(std::thread&&)
. I wonder if gcc unit tests exercise this operator.
The minimal reproduction code for the bug should be:
#include <thread>
#include <vector>
void g(int);
void f() {
std::vector<std::thread> threads(1);
threads[0] = std::thread(g, 42); // Compiler error comes from this statement.
threads[0].join();
}
I am unable to reproduce this bug with compilers available to me so far: https://godbolt.org/z/46Y3GGxns
Does this snippet reproduce the bug with your compiler? @emollier
Thanks for the snippet, I gave it a try with:
$ g++ --version
g++ (Debian 13.1.0-8) 13.1.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
but I'm afraid I didn't reproduce any build failure.
Thanks for the snippet, I gave it a try with:
$ g++ --version g++ (Debian 13.1.0-8) 13.1.0 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
but I'm afraid I didn't reproduce any build failure.
Was it compiled with -pthread -std=gnu++14 -O3 -Wall -Wextra -Werror
compiler options?
Ahah, silly me, no there was just -O0 to avoid optimizing out the function. Yes, the issue is reproducible with your compiler flags:
$ g++ -pthread -std=gnu++14 -O3 -Wall -Wextra -Werror -o repro repro.cpp
In file included from /usr/include/c++/13/bits/exception_ptr.h:41,
from /usr/include/c++/13/exception:164,
from /usr/include/c++/13/ios:41,
from /usr/include/c++/13/ostream:40,
from /usr/include/c++/13/iostream:41,
from repro.cpp:1:
In function ‘std::_Require<std::__not_<std::__is_tuple_like<_Tp> >, std::is_move_constructible<_Tp>, std::is_move_assignable<_Tp> > std::swap(_Tp&, _Tp&) [with _Tp = thread::id]’,
inlined from ‘void std::thread::swap(std::thread&)’ at /usr/include/c++/13/bits/std_thread.h:193:16,
inlined from ‘std::thread& std::thread::operator=(std::thread&&)’ at /usr/include/c++/13/bits/std_thread.h:187:11,
inlined from ‘int main()’ at repro.cpp:14:34:
/usr/include/c++/13/bits/move.h:198:7: error: array subscript 1 is outside array bounds of ‘std::thread [1]’ [-Werror=array-bounds=]
198 | __a = _GLIBCXX_MOVE(__b);
| ^~~
In file included from /usr/include/x86_64-linux-gnu/c++/13/bits/c++allocator.h:33,
from /usr/include/c++/13/bits/allocator.h:46,
from /usr/include/c++/13/string:43,
from /usr/include/c++/13/bits/locale_classes.h:40,
from /usr/include/c++/13/bits/ios_base.h:41,
from /usr/include/c++/13/ios:44:
In member function ‘_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = std::thread]’,
inlined from ‘static _Tp* std::allocator_traits<std::allocator<_CharT> >::allocate(allocator_type&, size_type) [with _Tp = std::thread]’ at /usr/include/c++/13/bits/alloc_traits.h:482:28,
inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:378:33,
inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_create_storage(std::size_t) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:395:44,
inlined from ‘std::_Vector_base<_Tp, _Alloc>::_Vector_base(std::size_t, const allocator_type&) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:332:26,
inlined from ‘std::vector<_Tp, _Alloc>::vector(size_type, const allocator_type&) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>]’ at /usr/include/c++/13/bits/stl_vector.h:554:47,
inlined from ‘int main()’ at repro.cpp:12:39:
/usr/include/c++/13/bits/new_allocator.h:147:55: note: at offset 8 into object of size 8 allocated by ‘operator new’
147 | return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
| ^
cc1plus: all warnings being treated as errors
Thank you very much!
For ulterior reference, the exact reproducer is:
$ cat repro.cpp
#include <iostream>
#include <thread>
#include <vector>
void g(int n);
void g(int n) {
std::cout << "consume: " << n << std::endl;
}
int main(void) {
std::vector<std::thread> threads(1);
// The compiler error comes from the two threads below.
threads[0] = std::thread(g, 42);
threads[1] = std::thread(g, 42);
threads[0].join();
threads[1].join();
return (0);
}
$ g++ repro.cpp -o repro -Wall -Werror -O3
Optimization subtleties seem to make the issue disappear when there is a single thread.
Oh wait, I forgot to bump the thread count initialization. This doesn't raise the issue:
$ cat repro.cpp
#include <iostream>
#include <thread>
#include <vector>
void g(int);
void g(int n) {
std::cout << "consume: " << n << std::endl;
}
int main(void) {
std::vector<std::thread> threads(2);
// The compiler error comes from the two threads below.
threads[0] = std::thread(g, 42);
threads[1] = std::thread(g, 42);
threads[0].join();
threads[1].join();
return (0);
}
$ g++ repro.cpp -o repro -Wall -Wextra -Werror -O3 -std=gnu++14 -pthread
$ echo $?
0
Sounds like we might be after an initialization issue of the thread count?
Alright, so, I'm not sure whether this is correct or not, but the build problem disappears for me when I apply the following change:
--- libatomic-queue.orig/src/benchmarks.cc
+++ libatomic-queue/src/benchmarks.cc
@@ -242,7 +242,7 @@
std::atomic<unsigned> last_consumer{thread_count};
Barrier barrier;
- std::vector<std::thread> threads(thread_count * 2 - 1);
+ std::vector<std::thread> threads(thread_count * 2 + 1);
if(alternative_placement) {
for(unsigned i = 0; i < thread_count; ++i) {
set_default_thread_affinity(hw_thread_ids[cpu_idx++]);
Thinking more about it, the root cause of the bug lies in std::thread& std::thread::operator=(std::thread&&)
, which shouldn't require using std::vector
to reproduce it. May be the following would suffice:
#include <thread>
void g(int);
__attribute__((noinline)) std::thread& f(std::thread& a) {
return a = std::thread(g, 42); // Compiler error comes from this statement.
}
It should be compiled as .o
without main
function calling the function to avoid whole program optimization, and should reproduce with just -pthread -std=gnu++14
compiler options, no specific optimization levels should be necessary.
This one builds fine as far as I could witness:
$ cat repro.cpp
#include <thread>
void g(int);
__attribute__((noinline)) std::thread& f(std::thread& a) {
return a = std::thread(g, 42); // Compiler error comes from this statement.
}
$ g++ repro.cpp -c -o repro.o -pthread -std=gnu++14
$ g++ repro.cpp -c -o repro.o -pthread -std=gnu++14 -Wall -Wextra -Werror
$ g++ repro.cpp -c -o repro.o -pthread -std=gnu++14 -O3 -Wall -Wextra -Werror
This one builds fine as far as I could witness:
Well, I am out of my depth here, and have no access to your compiler to play around with it.
No worries, I'm thankful for the time you already spent on the issue. :)
I have enough data to move the Debian issue forwards (either by getting the system compiler fixed, dropping optimization level a little for the affected code, or at worse dusting off the build error under the carpet with the patch).
I'll nudge Gcc maintainers to see whether a bump to 13.1.1 would be feasible, even if this is just to the experimental branch. Maybe we'll see whether this comes from the older compiler version.
I'll nudge Gcc maintainers to see whether a bump to 13.1.1 would be feasible, even if this is just to the experimental branch. Maybe we'll see whether this comes from the older compiler version.
The compiler error message is pretty clear and unambiguous - std::thread& std::thread::operator=(std::thread&&)
fails on swapping thread ids.
That gcc-13 "simpler implicit move" breaking change sounds like it only applies when building in c++23 mode, whereas this code is compiled in gnu++14 mode, which shouldn't be affected.
Alright, so, I'm not sure whether this is correct or not, but the build problem disappears for me when I apply the following change:
--- libatomic-queue.orig/src/benchmarks.cc +++ libatomic-queue/src/benchmarks.cc @@ -242,7 +242,7 @@ std::atomic<unsigned> last_consumer{thread_count}; Barrier barrier; - std::vector<std::thread> threads(thread_count * 2 - 1); + std::vector<std::thread> threads(thread_count * 2 + 1); if(alternative_placement) { for(unsigned i = 0; i < thread_count; ++i) { set_default_thread_affinity(hw_thread_ids[cpu_idx++]);
It does -1
because it re-uses the current thread instead of creating another one. This code is well-formed, have been subjected to valgrind
, and all available sanitizers in clang
and gcc
. clang
thread sanitizer did uncover a couple of places with memory orders not being strong enough, which I was quite impressed with, prior to version 1.0 release. Github continuous integration runs the unit tests and sanitizers with gcc
and clang
on each commit. gcc should do the same :).
I'll nudge Gcc maintainers to see whether a bump to 13.1.1 would be feasible, even if this is just to the experimental branch. Maybe we'll see whether this comes from the older compiler version.
This is Debian bug#1041607.
The compiler error message is
N.B. this is a warning not an error. You've chosen to turn it into an error with -Werror
, but most warnings are imperfect and have false positives.
The -Warray-bounds
warning (like the similar -Wstringop-overflow
and -Wstringop-overread
) is particularly prone to false positives when the compiler is unable to tell that a particular branch is unreachable and so warns about impossible conditions.
If you get false positives with particular warnings, don't use -Werror
(or use -Wno-error=array-bounds
), or suppress it locally, e.g.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Warray-bounds"
threads[0] = std::thread(g, 42);
threads[1] = std::thread(g, 42);
#pragma GCC diagnostic pop
I don't think this warning was introduced by the simpler implicit move changes, I can reproduce it with anything later than GCC 12.1, the warning started with https://gcc.gnu.org/r12-1992
This is Debian bug#1041607.
The issue title is very misleading, swapping thread IDs doesn't "fail", it gives a warning. Warnings have false positives, but just because you choose to turn that into a build failure with -Werror
doesn't mean that swapping thread IDs doesn't work correctly.
Hi jwakely, thanks for jumping in, your comments on the bug title and forwarding the issue to Gcc developers! :)
Issue retitle is on its way, and I intend to rebuild atomic_queue with -Wno-error=array-bounds for the time being, until resolution on gcc end.
This is Debian bug#1041607.
The issue title is very misleading, swapping thread IDs doesn't "fail", it gives a warning. Warnings have false positives, but just because you choose to turn that into a build failure with
-Werror
doesn't mean that swapping thread IDs doesn't work correctly.
You are quite right, Jon, I stand corrected and defer to you.
atomic_queue fails to build from source with Gcc 13 on today's Debian sid. The relevant part of the error message starts like:
This issue has initially been reported in Debian bug#1037715. I have been unable to identify what exactly in the porting guide to gcc-13 was relevant to the error at play.