microsoft / azurelinux

Linux OS for Azure 1P services and edge appliances
MIT License
4.29k stars 543 forks source link

`abseil-cpp` compilation creates ABI compatibility issues between `abseil-cpp` and dependent packages #10038

Closed surfacepatterns closed 4 weeks ago

surfacepatterns commented 2 months ago

Describe the bug The current abseil-cpp package is built with CMAKE_BUILD_TYPE set to None. This results in the package being built without NDEBUG being defined. This creates ABI incompatibilities between abseil-cpp and packages that depend on abseil-cpp that are compiled with NDEBUG defined.

I came across this issue when I was running tests for a custom grpc based service against Azure Linux. Each time I would run tests, the test executable would end up crashing with gdb tracebacks similar to this (sans traceback entries into proprietary code):

0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
1  0x00007ffff6729ed3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
2  0x00007ffff66ded86 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
3  0x00007ffff66c97e5 in __GI_abort () at abort.c:79
4  0x00007ffff63f2537 in absl::lts_20240116::raw_log_internal::(anonymous namespace)::RawLogVA(absl::lts_20240116::LogSeverity, char const*, int, char const*, __va_list_tag*) () at /lib/libabsl_raw_logging_internal.so.2401.0.0
5  0x00007ffff63f25ce in absl::lts_20240116::raw_log_internal::RawLog(absl::lts_20240116::LogSeverity, char const*, int, char const*, ...) ()
    at /lib/libabsl_raw_logging_internal.so.2401.0.0
6  0x00007ffff7b19280 in absl::lts_20240116::DeadlockCheck(absl::lts_20240116::Mutex*) () at /lib/libabsl_synchronization.so.2401.0.0
7  0x00007ffff7b1bb35 in absl::lts_20240116::Mutex::Lock() () at /lib/libabsl_synchronization.so.2401.0.0
8  0x00007ffff70113f0 in grpc_event_engine::experimental::BasicWorkQueue::Add(grpc_event_engine::experimental::EventEngine::Closure*) () at /lib/libgrpc.so.39
9  0x00007ffff700b19d in grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::Run(grpc_event_engine::experimental::EventEngine::Closure*) () at /lib/libgrpc.so.39
10 0x00007ffff700b2f3 in grpc_event_engine::experimental::WorkStealingThreadPool::Run(absl::lts_20240116::AnyInvocable<void ()>) () at /lib/libgrpc.so.39
11 0x00007ffff70039e2 in grpc_event_engine::experimental::TimerManager::TimerManager(std::shared_ptr<grpc_event_engine::experimental::ThreadPool>) ()
    at /lib/libgrpc.so.39
12 0x00007ffff6ff27a0 in grpc_event_engine::experimental::PosixEventEngine::PosixEventEngine() () at /lib/libgrpc.so.39
13 0x00007ffff6fdd1c3 in grpc_event_engine::experimental::DefaultEventEngineFactory() () at /lib/libgrpc.so.39
14 0x00007ffff6fdc7a5 in grpc_event_engine::experimental::CreateEventEngineInner() () at /lib/libgrpc.so.39
15 0x00007ffff6fdc7d2 in grpc_event_engine::experimental::CreateEventEngine() () at /lib/libgrpc.so.39
16 0x00007ffff6fdc9c9 in grpc_event_engine::experimental::GetDefaultEventEngine(grpc_core::SourceLocation) () at /lib/libgrpc.so.39
17 0x00007ffff6fdcd9c in grpc_event_engine::experimental::(anonymous namespace)::EnsureEventEngineInChannelArgs(grpc_core::ChannelArgs) () at /lib/libgrpc.so.39
18 0x00007ffff6fdd06b in std::_Function_handler<grpc_core::ChannelArgs (grpc_core::ChannelArgs), grpc_core::ChannelArgs (*)(grpc_core::ChannelArgs)>::_M_invoke(std::_Any_data const&, grpc_core::ChannelArgs&&) () at /lib/libgrpc.so.39
19 0x00007ffff6f851e6 in grpc_core::ChannelArgsPreconditioning::PreconditionChannelArgs(grpc_channel_args const*) const () at /lib/libgrpc.so.39
20 0x00007ffff71655d0 in grpc_server_create () at /lib/libgrpc.so.39
21 0x00007ffff7a728c4 in grpc::Server::Server(grpc::ChannelArguments*, std::shared_ptr<std::vector<std::unique_ptr<grpc::ServerCompletionQueue, std::default_delete<grpc::ServerCompletionQueue> >, std::allocator<std::unique_ptr<grpc::ServerCompletionQueue, std::default_delete<grpc::ServerCompletionQueue> > > > >, int, int, int, std::vector<std::shared_ptr<grpc::internal::ExternalConnectionAcceptorImpl>, std::allocator<std::shared_ptr<grpc::internal::ExternalConnectionAcceptorImpl> > >, grpc_server_config_fetcher*, grpc_resource_quota*, std::vector<std::unique_ptr<grpc::experimental::ServerInterceptorFactoryInterface, std::default_delete<grpc::experimental::ServerInterceptorFactoryInterface> >, std::allocator<std::unique_ptr<grpc::experimental::ServerInterceptorFactoryInterface, std::default_delete<grpc::experimental::ServerInterceptorFactoryInterface> > > >, grpc::experimental::ServerMetricRecorder*) () at /lib/libgrpc++.so.1.62
22 0x00007ffff7a6d0ca in grpc::ServerBuilder::BuildAndStart() () at /lib/libgrpc++.so.1.62
...

This happened because the implementation of absl::Mutex is changed by the presence of an NDEBUG definition. When abseil-cpp is compiled without NDEBUG, this definition of absl::Mutex::Dtor() is defined in the shared library and meant to be called by the absl::Mutex destructor. This helps clear out deadlock info when NDEBUG isn't defined. However, when NDEBUG is defined by dependent packages, this definition of absl::Mutex::Dtor() is inlined in the dependent binary for use by the dependent package, resulting in the former definition being unused, and causing grpc and other dependent packages to abort sporadically (but often) at absl::Mutex entry points.

One of the prominent maintainers of abseil-cpp has mentioned this issue previously.

This pull request fixes the problem.

To Reproduce Steps to reproduce the behavior:

  1. Compile an executable that links against abseil-cpp, preferably something that creates and destroys absl::Mutex instances, and make sure to define NDEBUG when compiling.
  2. Run the executable for some time, and wait for your executable to crash with a stack trace similar to the stack trace above.

(If the above doesn't illustrate the problem well enough, then I'll attempt to supply a minimal reproduction of the bug when I get a chance.)

Expected behavior I expect applications that link to abseil-cpp to run without crashing.

surfacepatterns commented 2 months ago

I've attached code that reproduces the crash to this issue. You can compile the program using:

g++ -o repro repro.cpp -O0 -DNDEBUG $(pkg-config absl_synchronization --cflags --libs)

Running the compiled program using gdb yields the following:

danderson [ ~/projects/abseil-cpp-repro ]$ gdb ./repro
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./repro...
(No debugging symbols found in ./repro)
(gdb) run
Starting program: /home/danderson/projects/abseil-cpp-repro/repro
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/printers.py:1273: SyntaxWarning: invalid escape sequence '\d'
  self.typename = re.sub('^std::experimental::fundamentals_v\d::', 'std::experimental::', self.typename, 1)
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/printers.py:1302: SyntaxWarning: invalid escape sequence '\w'
  x = re.sub("std::string(?!\w)", s, m.group(1))
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/printers.py:1346: SyntaxWarning: invalid escape sequence '\d'
  self.typename = re.sub('^std::(experimental::|)(fundamentals_v\d::|)(.*)', r'std::\1\3<%s>' % valtype, typename, 1)
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:151: SyntaxWarning: invalid escape sequence '\d'
  if not re.match('^std::(__\d+::)?array<.*>$', class_type.tag):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:268: SyntaxWarning: invalid escape sequence '\d'
  if not re.match('^std::(__\d+::)?deque<.*>$', class_type.tag):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:312: SyntaxWarning: invalid escape sequence '\d'
  if not re.match('^std::(__\d+::)?forward_list<.*>$', class_type.tag):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:393: SyntaxWarning: invalid escape sequence '\d'
  if not re.match('^std::(__\d+::)?(__cxx11::)?list<.*>$', class_type.tag):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:508: SyntaxWarning: invalid escape sequence '\d'
  if not re.match('^std::(__\d+::)?vector<.*>$', class_type.tag):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:557: SyntaxWarning: invalid escape sequence '\d'
  if not re.match('^std::(__\d+::)?%s<.*>$' % self._name, class_type.tag):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:590: SyntaxWarning: invalid escape sequence '\d'
  if re.match('^std::(__\d+::)?__uniq_ptr_(data|impl)<.*>$', impl_type):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:592: SyntaxWarning: invalid escape sequence '\d'
  elif re.match('^std::(__\d+::)?tuple<.*>$', impl_type):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:654: SyntaxWarning: invalid escape sequence '\d'
  if not re.match('^std::(__\d+::)?unique_ptr<.*>$', class_type.tag):
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:723: SyntaxWarning: invalid escape sequence '\['
  m = re.match('.*\[(\d+)]$', str(self._elem_type))
/usr/lib/../share/gcc-13.2.0/python/libstdcxx/v6/xmethods.py:775: SyntaxWarning: invalid escape sequence '\d'
  if not re.match('^std::(__\d+::)?shared_ptr<.*>$', class_type.tag):
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
iteration 0
initializing mutex
locking mutex
destroying mutex
iteration 1
initializing mutex
locking mutex
[mutex.cc : 1418] RAW: Potential Mutex deadlock:

[mutex.cc : 1428] RAW: Acquiring absl::Mutex 0x7fffffffdb60 while holding  0x7fffffffdb60; a cycle in the historical lock ordering graph has been observed
[mutex.cc : 1432] RAW: Cycle:
[mutex.cc : 1446] RAW: mutex@0x7fffffffdb60 stack:
[mutex.cc : 1454] RAW: dying due to potential deadlock

Program received signal SIGABRT, Aborted.
0x00007ffff79960d4 in __pthread_kill_implementation () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff79960d4 in __pthread_kill_implementation () from /usr/lib/libc.so.6
#1  0x00007ffff7945bde in raise () from /usr/lib/libc.so.6
#2  0x00007ffff792e832 in abort () from /usr/lib/libc.so.6
#3  0x00007ffff7f17527 in absl::lts_20240116::raw_log_internal::(anonymous namespace)::RawLogVA(absl::lts_20240116::LogSeverity, char const*, int, char const*, __va_list_tag*) () from /usr/lib/libabsl_raw_logging_internal.so.2401.0.0
#4  0x00007ffff7f175be in absl::lts_20240116::raw_log_internal::RawLog(absl::lts_20240116::LogSeverity, char const*, int, char const*, ...) ()
   from /usr/lib/libabsl_raw_logging_internal.so.2401.0.0
#5  0x00007ffff7fb21a0 in absl::lts_20240116::DeadlockCheck(absl::lts_20240116::Mutex*) () from /usr/lib/libabsl_synchronization.so.2401.0.0
#6  0x00007ffff7fb4ab5 in absl::lts_20240116::Mutex::Lock() () from /usr/lib/libabsl_synchronization.so.2401.0.0
#7  0x0000555555555255 in main ()
reubeno commented 4 weeks ago

It looks like this was resolved by #10003; thanks for reporting this issue and contributing a fix!