llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.43k stars 12.16k forks source link

Version 19.1.0 fails to compile #101368

Open davide-q opened 4 months ago

davide-q commented 4 months ago

This started in RC1 and I was hoping to get it fixed before it was released.

The failure is about linking Curses-related objects, spitting out many errors like

 lib/liblldbCore.a(IOHandlerCursesGUI.cpp.o): undefined reference to symbol 'define_key@@@@NCURSES6_TINFO_5.0.19991023'
FAILED: lib/liblldb.so.19.1.0-rc1
IOHandlerCursesGUI.cpp:(.text._ZN6curses6Window12DrawTitleBoxEPKcS2_[_ZN6curses6Window12DrawTitleBoxEPKcS2_]+0x4a): undefined reference to `acs_map'
ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x112c): undefined reference to `curs_set'
ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x1138): undefined reference to `stdscr'
ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x1145): undefined reference to `keypad'
ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x1be9): undefined reference to `define_key'
ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x1bfa): undefined reference to `define_key'

and many, many more

I wonder if #25067 has been inadvertently re-introduced?

llvmbot commented 4 months ago

@llvm/issue-subscribers-lldb

Author: None (davide-q)

With various errors like ``` lib/liblldbCore.a(IOHandlerCursesGUI.cpp.o): undefined reference to symbol 'define_key@@@@NCURSES6_TINFO_5.0.19991023' FAILED: lib/liblldb.so.19.1.0-rc1 IOHandlerCursesGUI.cpp:(.text._ZN6curses6Window12DrawTitleBoxEPKcS2_[_ZN6curses6Window12DrawTitleBoxEPKcS2_]+0x4a): undefined reference to `acs_map' ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x112c): undefined reference to `curs_set' ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x1138): undefined reference to `stdscr' ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x1145): undefined reference to `keypad' ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x1be9): undefined reference to `define_key' ld: IOHandlerCursesGUI.cpp:(.text._ZN12lldb_private18IOHandlerCursesGUI8ActivateEv.part.713+0x1bfa): undefined reference to `define_key' ``` and more I wonder if #25067 has been inadvertently re-introduced?
davide-q commented 3 months ago

Just FYI, in my environment the problem has not changed with rc2

DavidSpickett commented 3 months ago

I wonder if https://github.com/llvm/llvm-project/issues/25067 has been inadvertently re-introduced?

I think so.

The original terminfo fix from https://github.com/llvm/llvm-project/pull/69458 was removed by https://github.com/llvm/llvm-project/pull/92865 which removed terminfo as a dependency.

The original fix is mentioned in https://github.com/spack/spack/issues/36954 where they are missing the same symbol you are.

And someone from there turned up on the removal PR: https://github.com/llvm/llvm-project/pull/92865#issuecomment-2261438018

Perhaps this is because on a system where ncurses and terminfo are combined, removing terminfo is ok because the symbols come from the combined ncurses anyway. When they are split into two, we still have to link terminfo as well to be able to use ncurses.

What is your environment and does it have this split? Some of the commands shown in https://github.com/llvm/llvm-project/issues/25067#issue-1076221744 can confirm that, I presume.

@JDevlieghere also commented there but didn't want to commit something without trying it, so if we can run your environment in a container we can figure it out.

davide-q commented 3 months ago

Thanks @DavidSpickett

My environment is a bare-metal RHEL 8.8 installation in which I am attempting (as I always have) a spack-based install of LLVM, so all of your sleuthing is spot-on! I don't use containers, so I can't help with that (unless I make a massive effort), but perhaps you would be able to reproduce the issue with spack which is a git-clone and a couple of spack bootstrap commands. Let me know if you would like me to list a step-by-step on how to reproduce it in that context.

davide-q commented 3 months ago

The llvm-19.1.0-rc3 version has still this issue (and a couple more, one of which is #106549 )

davide-q commented 2 months ago

Nothing has changed with RC4, I updated the name to reflect that.

I hope LLVM is not going to release 19.1 with this building bug.

davide-q commented 2 months ago

So disappointing to see the release happening despite this serious problem

jryans commented 2 months ago

LLVM releases follow a set schedule and do not generally wait for specific fixes.

I would think the quickest path here would be for someone with access to the environment in question to prepare a fix.

DavidSpickett commented 2 months ago

Let me know if you would like me to list a step-by-step on how to reproduce it in that context.

Yes please, I have zero experience with spack. I think I can use one of the images described in https://developers.redhat.com/blog/2020/03/24/red-hat-universal-base-images-for-docker-users# to reproduce it.

Surprised that this isn't an issue for rhel without spack, but if spack is a full stack rebuild maybe it configures the libraries differently to the base system.

aaronmondal commented 2 months ago

There were various symbol refactors between curses and terminfo in the past where it turned out that many distros distributed essentially broken variants of these libraries for quite some time which IIRC started getting exposed when lld started erroring on misconfigured linker scripts by default. I suspect that you're hitting one of these cases where the rhel libs are broken and now that the llvm link lines are no longer overspecified (by the implicit dep from llvm that was removed in 19.1) it exposes this issue.

So this is not an LLVM building bug, but a bug in rhel's distributed variants of curses and terminfo. You'll most likely be able to fix the issue by using a more recent version of curses and/or terminfo.

davide-q commented 2 months ago

Thanks a lot for picking this up @DavidSpickett

Surprised that this isn't an issue for rhel without spack, but if spack is a full stack rebuild maybe it configures the libraries differently to the base system.

Yes, spack tries hard to abstract everything out from the underlying OS and builds an environment which is (as far as possible) independent from where it is running on (and in a way that is modular and allows cherrypicking of dependencies, libraries etc).

Because of that, I suspect the same identical problem would happen in a different OS, therefore if you have easier access to Ubuntu or whatever, you might try that first.

Yes please, I have zero experience with spack.

git clone https://github.com/spack/spack.git xxx
cd xxx
# optionally git checkout releases/v0.22.1-a # or whatever
source share/spack/setup-env.sh

spack compiler list # should return a compiler otherwise you need to add one to your base OS

Then edit var/spack/repos/builtin/packages/llvm/package.py to add the latest version of LLVM, as follows

--- a/var/spack/repos/builtin/packages/llvm/package.py
+++ b/var/spack/repos/builtin/packages/llvm/package.py
@@ -36,7 +36,12 @@ class Llvm(CMakePackage, CudaPackage, CompilerPackage):
     license("Apache-2.0")

     version("main", branch="main")
+    version("19.1.0", sha256="0a08341036ca99a106786f50f9c5cb3fbe458b3b74cab6089fd368d0edb2edfe")
     version("18.1.3", sha256="fc5a2fd176d73ceb17f4e522f8fe96d8dde23300b8c233476d3609f55d995a7a")
     version("18.1.2", sha256="8d686d5ece6f12b09985cb382a3a530dc06bb6e7eb907f57c7f8bf2d868ebb0b")
     version("18.1.1", sha256="62439f733311869dbbaf704ce2e02141d2a07092d952fc87ef52d1d636a9b1e4")
     version("18.1.0", sha256="eb18f65a68981e94ea1a5aae4f02321b17da9e99f76bfdb983b953f4ba2d3550")

Then run spack spec llvm@19.1.0 and see the list of dependencies that spack will compile and link against LLVM. If everything looks good, go ahed with compilation (see below) otherwise change what dependencies you want to add (+) or remove (~) for example spack spec llvm@19.1.0+cuda~clang. In my case I am using the default with no modifiers which yields

llvm@19.1.0%gcc@13.2.0+clang~cuda~flang+gold~ipo+libomptarget~libomptarget_debug~link_llvm_dylib+lld+lldb+llvm_dylib+lua~mlir+polly~python~split_dwarf~z3~zstd build_system=cmake build_type=Release compiler-rt=runtime generator=ninja libcxx=runtime libunwind=runtime openmp=runtime shlib_symbol_version=none targets=all version_suffix=none

Then compile simply with spack compile XXX where XXX is same as in the spec, most likely simply llvm@19.1.0, or possibly with the modifiers of what you decided to add/remove, in my example llvm@19.1.0+cuda~clang

davide-q commented 2 months ago

@aaronmondal thanks for your comment, this explains better why it happens (still not sure why "overspecifying the link line" fixes it and what's technically wrong with that overspecification, but I digress...)

It's definitely possible that the underlying libraries I am linking against are broken, and in such a case I can solve my problem by editing the spack package.py file to include the new dependencies and compile them from source. For me to do so, LLVM should have told me so, right? In the old autoconf days this would/should have been caught when running configure which would/should try to compile and link a dummy test program against all the needed dependency. Seeing a failure at that configure time I would have understood there is a dependency which is not (correctly) satisfied -- or even better it could have been configured to compile without the feature

I am not familiar enough with cmake and the LLVM build system to say how exactly this should happen, but I still consider it a bug in the LLVM build system. The problem should be identified upfront during the equivalent of the configure step and not after an hour of compile time at the final link step. Don't you agree?

bernhardkaindl commented 2 months ago

The patch in this comment avoids this issue https://github.com/llvm/llvm-project/issues/101368: https://github.com/spack/spack/pull/46504#issuecomment-2372520318

ajordanr-google commented 1 month ago

To also bring this bug up again, ChromeOS also was unable to build LLDB because of this breakage. We had to hack in the Spack patch ourselves: https://chromium-review.googlesource.com/c/external/github.com/llvm/llvm-project/+/5902466

This will likely be an issue for other distributions which don't keep the missing tinfo symbols inside the curses shared object.

vityank commented 1 month ago

I've stumbled upon this issue as well for LLVM 19.1.0+19.1.1 which can easily be reproduced with Ncurses 6.5 which got terminfo functions separated from ncurses by default. I've created this simple patch to resolve the issue for me without adding the removed LLVM_ENABLE_TERMINFO back. The key is that split Ncurses will have -ltinfo in the NCURSES_LIBRARIES. It will also not harm alternate curses implementations as it will be empty for those and thus will not affect the resulting LLDB_CURSES_LIBS array. For monolith Ncurses it will just result in double -lncurses, so no problem here either.

--- ./lldb/source/Core/CMakeLists.txt   2024-10-01 15:08:12.000000000 +0300
+++ ./lldb/source/Core/CMakeLists.txt   2024-10-04 01:13:09.558078374 +0300
@@ -10,7 +10,7 @@
 set(LLDB_LIBEDIT_LIBS)

 if (LLDB_ENABLE_CURSES)
-  list(APPEND LLDB_CURSES_LIBS ${PANEL_LIBRARIES} ${CURSES_LIBRARIES})
+  list(APPEND LLDB_CURSES_LIBS ${PANEL_LIBRARIES} ${CURSES_LIBRARIES} ${NCURSES_LIBRARIES})
   if (LLVM_BUILD_STATIC)
     list(APPEND LLDB_CURSES_LIBS gpm)
   endif()
DavidSpickett commented 1 month ago

Because of that, I suspect the same identical problem would happen in a different OS, therefore if you have easier access to Ubuntu or whatever, you might try that first.

I have been super busy but do intend to try this at some point, I realise the delay is annoying.

In the meantime if someone wants to propose the above change as a PR, that would also be welcome. In fact if it comes from someone who is directly affected by the problem, it will have more weight behind it.

DavidSpickett commented 6 hours ago

I was able to reproduce this kind of, but I had to hack my CMake config manually and even then ended up with system ncurses also being referenced. This is no doubt a skill issue on my part but I'm not going to propose a change I can't verify either, especially when it is to a build system.

I've stumbled upon this issue as well for LLVM 19.1.0+19.1.1 which can easily be reproduced with Ncurses 6.5 which got terminfo functions separated from ncurses by default.

@vityank do you know how to properly configure CMake to use an ncurses that is not the system one?

Also please provide the configure command for ncurses. In my attempts I used:

$ ./configure --with-termlib --without-ada --with-shared --with-cxx-shared

Does that look correct?

I did setup Spack and built main without any issue which either means I did something wrong or the included patch for 19 also applies to 20, but that's not what I understood from the code.

If you don't want to wait for me to figure out the non-spack build, confirm that the above fix (or the one already in spack for 19.x) works with main and send a PR for it (https://llvm.org/docs/Contributing.html). I am much more able to check that the existing builds work, rather than reproducing this new (to us at least) build configuration.