llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.6k stars 11.82k forks source link

clang 17 vs clang 18+ and gcc 13 binaries linking problem #102443

Open zlojvavan opened 2 months ago

zlojvavan commented 2 months ago

after upgrading ubuntu 23.10 to 24.04 and clang from 17 to 18 cannot build my grpc related projects with clang 18 (and newer) due to link time undefined symbol in debug config:

ld.lld: error: undefined symbol: absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<<<unsigned long, 0>(unsigned long const&)
referenced by log_message.h:132 (/opt/vcpkg/installed/x64-linux/include/absl/log/internal/log_message.h:132)
               bin/server/CMakeFiles/lcs.dir/grpc.cpp.o:(absl::lts_20240116::log_internal::LogMessage::operator<<(unsigned long))

grpc (and abseil) were installed using vcpkg and gcc 13 and I've had no problems building my projects with either clang 17 or gcc 13, but not so with clang 18+

I can see differences in generated symbols between clang 17 and 18, there's missing "U absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <unsigned long, 0>(unsigned long const&)"

~/.vs/LCS/out/build/linux-debug-clang-18/bin/reference/CMakeFiles/lcsref.dir$ nm grpc.cpp.o -C | grep LogMessage::operator 0000000000000000 W absl::lts_20240116::log_internal::LogMessage::operator<<(unsigned long) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <19>(char const (&) [19])

vs

~/.vs/LCS/out/build/linux-debug-clang-17/bin/reference/CMakeFiles/lcsref.dir$ nm grpc.cpp.o -C | grep LogMessage::operator 0000000000000000 W absl::lts_20240116::log_internal::LogMessage::operator<<(unsigned long) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <19>(char const (&) [19]) U absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <unsigned long, 0>(unsigned long const&)

corresponding abseil lib indeed has those templated operator << versions (including <unsigned long, 0>) declared in abseil log_message.h as

 // Types that support `AbslStringify()` are serialized that way.
  template <typename T,
            typename std::enable_if<absl::HasAbslStringify<T>::value,
                                    int>::type = 0>
  LogMessage& operator<<(const T& v) ABSL_ATTRIBUTE_NOINLINE;

  // Types that don't support `AbslStringify()` but do support streaming into a
  // `std::ostream&` are serialized that way.
  template <typename T,
            typename std::enable_if<!absl::HasAbslStringify<T>::value,
                                    int>::type = 0>
  LogMessage& operator<<(const T& v) ABSL_ATTRIBUTE_NOINLINE;

/opt/vcpkg/installed/x64-linux/lib$ nm libabsl_log_internal_message.a -C | grep LogMessage::operator 0000000000000da0 T absl::lts_20240116::log_internal::LogMessage::operator<<(std::ostream& (*)(std::ostream&)) 00000000000000fa t absl::lts_20240116::log_internal::LogMessage::operator<<(std::ostream& (*)(std::ostream&)) [clone .cold] 0000000000000e20 T absl::lts_20240116::log_internal::LogMessage::operator<<(std::ios_base& (*)(std::ios_base&)) 0000000000000120 t absl::lts_20240116::log_internal::LogMessage::operator<<(std::ios_base& (*)(std::ios_base&)) [clone .cold] 00000000000011b0 T absl::lts_20240116::log_internal::LogMessage::operator<<(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 0000000000001380 T absl::lts_20240116::log_internal::LogMessage::operator<<(std::basic_string_view<char, std::char_traits<char> >) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <signed char, 0>(signed char const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <bool, 0>(bool const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <char, 0>(char const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <double, 0>(double const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <float, 0>(float const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <unsigned char, 0>(unsigned char const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <int, 0>(int const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <unsigned int, 0>(unsigned int const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <long, 0>(long const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <unsigned long, 0>(unsigned long const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <void const*, 0>(void const* const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <void*, 0>(void* const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <short, 0>(short const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <unsigned short, 0>(unsigned short const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <long long, 0>(long long const&) 0000000000000000 W absl::lts_20240116::log_internal::LogMessage& absl::lts_20240116::log_internal::LogMessage::operator<< <unsigned long long, 0>(unsigned long long const&)

so atm I'm unable to build my projects with clang-18+

DimitryAndric commented 2 months ago

My guess is that the templates don't get instantiated, possibly because they are not referenced? At least not in the expected object file? I would be good if you could reduce this quite a bit, since the whole of Abseil is way too much. :)

zlojvavan commented 2 months ago

@DimitryAndric I don't know, it worked as expected with clang-17 and the only difference is used compiler, so something triggered change in behavior in newer clang versions

I was able to build/link all three projects with clang-18 by requesting explicit instantiation in one of common files in my projects:

namespace absl { ABSL_NAMESPACE_BEGIN namespace log_internal { template LogMessage& LogMessage::operator << (unsigned long const&); } ABSL_NAMESPACE_END }

zlojvavan commented 2 months ago

unfortunately despite explicitly specifying "18+" in problem description I was actually unable to test it against clang 19 and 20 as both barks at boost.future but I suppose that's another story: /opt/vcpkg/installed/x64-linux/include/boost/thread/future.hpp:4671:19: error: no member named 'that' in 'run_it<FutureExecutorContinuationSharedState>'; did you mean 'that_'? 4671 | that_=x.that; | ^ /opt/vcpkg/installed/x64-linux/include/boost/thread/future.hpp:4649:55: note: 'that_' declared here 4649 | shared_ptr<FutureExecutorContinuationSharedState> that_;

zlojvavan commented 2 months ago

I suppose that's another story

already reported

mering commented 1 month ago

We are having the same abseil linking problem when trying to upgrade from LLVM 17 to 18.

I created a reproduction Bazel workspace at https://github.com/mering/llvm-18-linking-issue. See the README.md for instructions.

When changing the llvm_toolchain attribute in MODULE.bazel from 18.1.4 to 17.0.6 this problem disappears and the build succeeds. The different LLVM versions are downloaded from the GitHub releases in this repo (see https://github.com/bazel-contrib/toolchains_llvm/blob/v1.1.2/toolchain/internal/llvm_distributions.bzl).

DimitryAndric commented 1 month ago

I created a reproduction Bazel workspace at https://github.com/mering/llvm-18-linking-issue.

This gives me a 404, probably the access rights not public?

mering commented 1 month ago

I created a reproduction Bazel workspace at https://github.com/mering/llvm-18-linking-issue.

This gives me a 404, probably the access rights not public?

Oops sorry, now it should be public.

mering commented 1 month ago

I played a little with my reproduction example and noticed that it builds correctly with any of the following changes:

mering commented 1 month ago

I could reduce the abseil dependency to 3 headers and 1 source file (all with minimal content): https://github.com/mering/llvm-18-linking-issue/tree/main/abseil-cpp/absl.

I verified that it fails to link with 18.1.4 but links just fine with 17.0.6.

Interestingly, when I use absl/strings/has_absl_stringify.h instead of absl/strings/internal/has_absl_stringify.h in test.cc the error disappears. The same change is also present in googletest 1.15.0. This might explain why an upgrade of googletest make the problem disappear as well.

DimitryAndric commented 1 month ago

When I try the bazelisk commands exactly as you specified in that repository, I get an error right away:

$ ./bazelisk-linux-amd64 //test --verbose_failures
Command '//test' not found. Try 'bazel help'.

If I try to use the plain test command, it says there are no tests:

$ ./bazelisk-linux-amd64 test --verbose_failures
INFO: Found 0 test targets...
INFO: Elapsed time: 0.050s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
ERROR: No test targets were found, yet testing was requested

I have roughly zero knowledge of how bazel or bazelisk work, so I can only conclude that it does not work as advertised. :)

DimitryAndric commented 1 month ago

I think it would be much easier if you could just reduce this to a series of compile commands, followed by a link command. I.e. a list of clang -c ... invocations, followed by a clang invocation that links the whole thing together.

Usually this can be scraped from a build tool's log output, when you run it in verbose mode (since hiding what those tools are actually doing is unfortunately the norm these days).

mering commented 1 month ago

You are missing the build command. //test is the target.

The following command should work for you:

./bazelisk-linux-amd64 build //test --verbose_failures
mering commented 1 month ago

If you run with --sandbox_debug, it will print instructions on how you can enter the sandbox. I chose Bazel not only because this is what we use but also to have a reproducible environment with buildroot and toolchain.

DimitryAndric commented 1 month ago

Right, that seems to go further, but the downloaded llvm binaries bomb out because they try to use libtinfo.so.5 which is no longer available in Ubuntu 24.04:

external/toolchains_llvm~~llvm~llvm_toolchain_llvm/bin/clang: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory

I'll have to spin up a container with some old Ubuntu version in there. Probably 22.04 will do?

mering commented 1 month ago

Yes, ubuntu:22.04 or debian:12 both allow apt install libtinfo5.

mering commented 1 month ago

The linker parameters are (also containing the paths to interesting files):

% cat bazel-bin/test/test-2.params
-o
bazel-out/k8-fastbuild/bin/test/test
-Wl,-S
--target=x86_64-unknown-linux-gnu
-lm
-no-canonical-prefixes
-fuse-ld=lld
-Wl,--build-id=md5
-Wl,--hash-style=gnu
-Wl,-z,relro,-z,now
-l:libc++.a
-l:libc++abi.a
-l:libunwind.a
-rtlib=compiler-rt
-lpthread
-ldl
bazel-out/k8-fastbuild/bin/test/_objs/test/test.pic.o
-Wl,--start-lib
bazel-out/k8-fastbuild/bin/external/abseil-cpp~/absl/log/internal/_objs/log_message/log_message.pic.o
-Wl,--end-lib
--sysroot=external/_main~_repo_rules~com_googleapis_storage_chrome_linux_amd64_sysroot/

The log_message.pic.o should contain the missing symbol as it contains the template specialization in the log_message.cc file:

namespace absl {
namespace log_internal {
template LogMessage& LogMessage::operator<<(const int& v);
}  // namespace log_internal
}  // namespace absl
mering commented 1 month ago

For LLVM 17.0.6, it contains the following symbols:

% nm -C bazel-bin/external/abseil-cpp~/absl/log/internal/_objs/log_message/log_message.pic.o
0000000000000000 W absl::log_internal::LogMessage& absl::log_internal::LogMessage::operator<< <int, 0>(int const&)

% objdump -t -C bazel-bin/external/abseil-cpp~/absl/log/internal/_objs/log_message/log_message.pic.o

bazel-bin/external/abseil-cpp~/absl/log/internal/_objs/log_message/log_message.pic.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 log_message.cc
0000000000000000 l    d  .text._ZN4absl12log_internal10LogMessagelsIiLi0EEERS1_RKT_     0000000000000000 .text._ZN4absl12log_internal10LogMessagelsIiLi0EEERS1_RKT_
0000000000000000  w    F .text._ZN4absl12log_internal10LogMessagelsIiLi0EEERS1_RKT_     0000000000000012 absl::log_internal::LogMessage& absl::log_internal::LogMessage::operator<< <int, 0>(int const&)

For LLVM 18.1.4, it contains the following symbols:

% nm -C bazel-bin/external/abseil-cpp~/absl/log/internal/_objs/log_message/log_message.pic.o        
0000000000000000 W _ZN4absl12log_internal10LogMessagelsIiTnNSt3__19enable_ifIXntsr4absl16HasAbslStringifyIT_EE5valueEiE4typeELi0EEERS1_RKS5_

% objdump -t -C bazel-bin/external/abseil-cpp~/absl/log/internal/_objs/log_message/log_message.pic.o

bazel-bin/external/abseil-cpp~/absl/log/internal/_objs/log_message/log_message.pic.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 log_message.cc
0000000000000000 l    d  .text._ZN4absl12log_internal10LogMessagelsIiTnNSt3__19enable_ifIXntsr4absl16HasAbslStringifyIT_EE5valueEiE4typeELi0EEERS1_RKS5_        0000000000000000 .text._ZN4absl12log_internal10LogMessagelsIiTnNSt3__19enable_ifIXntsr4absl16HasAbslStringifyIT_EE5valueEiE4typeELi0EEERS1_RKS5_
0000000000000000  w    F .text._ZN4absl12log_internal10LogMessagelsIiTnNSt3__19enable_ifIXntsr4absl16HasAbslStringifyIT_EE5valueEiE4typeELi0EEERS1_RKS5_        0000000000000012 _ZN4absl12log_internal10LogMessagelsIiTnNSt3__19enable_ifIXntsr4absl16HasAbslStringifyIT_EE5valueEiE4typeELi0EEERS1_RKS5_

With LLVM 18 the symbol seems to contain the enable_if part.

DimitryAndric commented 1 month ago

I ditched the whole bazel thing and just used clang++-18 (from apt.llvm.org) directly:

$ clang++-18 -I abseil-cpp test/test.cc abseil-cpp/absl/log/internal/log_message.cc
/usr/bin/ld: /home/dim/tmp/test-10954f.o: in function `absl::log_internal::LogMessage::operator<<(int)':
test.cc:(.text._ZN4absl12log_internal10LogMessagelsEi[_ZN4absl12log_internal10LogMessagelsEi]+0x18): undefined reference to `_ZN4absl12log_internal10LogMessagelsIiTnNSt9enable_ifIXntsr4absl16strings_internal16HasAbslStringifyIT_EE5valueEiE4typeELi0EEERS1_RKS4_'
clang++-18: error: linker command failed with exit code 1 (use -v to see invocation)

The only thing I had to modify to get it to compile that way is adding a <cstdint> include, otherwise test.cc would fail to compile because int32_t was not found:

diff --git a/test/test.cc b/test/test.cc
index 6303a1a..62bfc2d 100644
--- a/test/test.cc
+++ b/test/test.cc
@@ -1,3 +1,4 @@
+#include <cstdint>
 // SOURCE: https://github.com/google/googletest/blob/v1.14.0/googletest/include/gtest/gtest-message.h#L121-L150
 #include <type_traits>
 // NOTE The linker error disappears when using "absl/strings/has_absl_stringify.h" instead of "absl/strings/internal/has_absl_stringify.h".

I.e. the error is not dependent on whether you are using this sandbox environment, or just the native host toolchain.

DimitryAndric commented 1 month ago

Now that I got it in a simpler for to compile, I could bisect, and it looks like this particular use case is broken (or fixed ;) ) after llvmorg-18-init-6331-g4b163e343cf ("Implement mangling rules for C++20 concepts and requires-expressions") by @zygoloid (cc @erichkeane @AaronBallman @rjmccall).

I am pretty much unsure if this is a wanted or unwanted side-effect of this change.

mering commented 1 month ago

@DimitryAndric Thanks for bisecting! I can confirm that it builds successfully when using --cxxopt="-fclang-abi-compat=17" as noted in the commit description.

Chekov2k commented 1 month ago

@DimitryAndric @mering That works for me as well! Thank you :-)

DimitryAndric commented 1 month ago

It would still be nice if one of the experts could explain what the correct way is to declare these templates, so they can be emitted and found by the linker? :)

zygoloid commented 1 month ago

It would still be nice if one of the experts could explain what the correct way is to declare these templates, so they can be emitted and found by the linker? :)

The Abseil code isn't incorrect.

Between Clang 17 and Clang 18, we implemented an ABI fix to resolve mangled name collisions, but unfortunately that changes the manglings of a small number of functions such as this one (which previously was a case where mangling collisions between different functions could happen). The ABI change fixed a longstanding bug, but we finally reached sufficient motivation to fix it due to some C++20 changes, primarily the addition of concepts, which made these kinds of symbol collisions more likely to occur in practice in the future. This was just an all-round bad situation, and sadly we didn't have a path forward that supported C++20 well and had no risk of breaking anything.

There are a few paths forward for code affected by this:

On the Clang side, we could investigate emitting the symbol with both the old and the new mangling (eg, when we see a reference to a function template specialization that we can't instantiate locally due to an extern template declaration, and its mangling changed due to this ABI change, we could emit a definition of the symbol as a weak alias to the old mangled name). GCC has the ability to do this to mitigate the pain of mangling changes. But I'm worried that this would break more than it fixes... Another possibility we could consider would be to add support for a [[clang::abi_compat(17)]] attribute on the explicit instantiation declaration to indicate the definition uses the old mangling. That might be a useful general feature, and potentially easier for Abseil to deploy as a workaround than the other options above.