[clang][build] Error: The install of the * target requires changing an RPATH (mingw)

ilg-ul commented 1 year ago

While compiling 17.0.2 for Windows using mingw-w64 and a bootstrap compiler, I got the following error while configuring the native sub-build:

[136/5530] cd /home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/x86_64-w64-mingw32/build/llvm-17.0.2/NATIVE && /home/ilg/.local/xPacks/@xpack-dev-tools/cmake/3.26.5-1.1/.content/bin/cmake -G Ninja -DCMAKE_MAKE_PROGRAM="/home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/xpacks/.bin/ninja" -DCMAKE_C_COMPILER_LAUNCHER="" -DCMAKE_CXX_COMPILER_LAUNCHER="" /home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/sources/llvm-project-17.0.2.src/llvm -DLLVM_TARGET_IS_CROSSCOMPILE_HOST=TRUE -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="" -DLLVM_DEFAULT_TARGET_TRIPLE="x86_64-w64-mingw32" -DLLVM_TARGET_ARCH="host" -DLLVM_ENABLE_PROJECTS="clang;lld;lldb;clang-tools-extra;polly" -DLLVM_EXTERNAL_PROJECTS="" -DLLVM_ENABLE_RUNTIMES="" -DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN="OFF" -DLLVM_INCLUDE_BENCHMARKS=OFF -DLLVM_INCLUDE_TESTS=OFF -DCMAKE_BUILD_TYPE=Release -DLLVM_EXTERNAL_CLANG_SOURCE_DIR=/home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/sources/llvm-project-17.0.2.src/llvm/../clang
FAILED: NATIVE/CMakeCache.txt /home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/x86_64-w64-mingw32/build/llvm-17.0.2/NATIVE/CMakeCache.txt 
cd /home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/x86_64-w64-mingw32/build/llvm-17.0.2/NATIVE && /home/ilg/.local/xPacks/@xpack-dev-tools/cmake/3.26.5-1.1/.content/bin/cmake -G Ninja -DCMAKE_MAKE_PROGRAM="/home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/xpacks/.bin/ninja" -DCMAKE_C_COMPILER_LAUNCHER="" -DCMAKE_CXX_COMPILER_LAUNCHER="" /home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/sources/llvm-project-17.0.2.src/llvm -DLLVM_TARGET_IS_CROSSCOMPILE_HOST=TRUE -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="" -DLLVM_DEFAULT_TARGET_TRIPLE="x86_64-w64-mingw32" -DLLVM_TARGET_ARCH="host" -DLLVM_ENABLE_PROJECTS="clang;lld;lldb;clang-tools-extra;polly" -DLLVM_EXTERNAL_PROJECTS="" -DLLVM_ENABLE_RUNTIMES="" -DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN="OFF" -DLLVM_INCLUDE_BENCHMARKS=OFF -DLLVM_INCLUDE_TESTS=OFF -DCMAKE_BUILD_TYPE=Release -DLLVM_EXTERNAL_CLANG_SOURCE_DIR=/home/ilg/Work/xpack-dev-tools/clang-xpack.git/build/win32-x64/sources/llvm-project-17.0.2.src/llvm/../clang
-- The C compiler identification is Clang 17.0.2
-- The CXX compiler identification is Clang 17.0.2
-- The ASM compiler identification is Clang with GNU-like command-line
...
-- Configuring done (18.2s)
CMake Error at cmake/modules/AddLLVM.cmake:967 (add_executable):
  The install of the llvm-tblgen target requires changing an RPATH from the
  build tree, but this is not supported with the Ninja generator unless on an
  ELF-based or XCOFF-based platform.  The CMAKE_BUILD_WITH_INSTALL_RPATH
  variable may be set to avoid this relinking step.
Call Stack (most recent call first):
  cmake/modules/TableGen.cmake:146 (add_llvm_executable)
  utils/TableGen/CMakeLists.txt:33 (add_tablegen)

CMake Error at cmake/modules/AddLLVM.cmake:588 (add_library):
  The install of the LTO target requires changing an RPATH from the build
  tree, but this is not supported with the Ninja generator unless on an
  ELF-based or XCOFF-based platform.  The CMAKE_BUILD_WITH_INSTALL_RPATH
  variable may be set to avoid this relinking step.
Call Stack (most recent call first):
  cmake/modules/AddLLVM.cmake:848 (llvm_add_library)
  tools/lto/CMakeLists.txt:32 (add_llvm_library)

CMake Error at cmake/modules/AddLLVM.cmake:588 (add_library):
  The install of the LTO target requires changing an RPATH from the build
  tree, but this is not supported with the Ninja generator unless on an
  ELF-based or XCOFF-based platform.  The CMAKE_BUILD_WITH_INSTALL_RPATH
  variable may be set to avoid this relinking step.
Call Stack (most recent call first):
  cmake/modules/AddLLVM.cmake:848 (llvm_add_library)
  tools/lto/CMakeLists.txt:32 (add_llvm_library)

CMake Error at cmake/modules/AddLLVM.cmake:967 (add_executable):
  The install of the llvm-ar target requires changing an RPATH from the build
  tree, but this is not supported with the Ninja generator unless on an
  ELF-based or XCOFF-based platform.  The CMAKE_BUILD_WITH_INSTALL_RPATH
  variable may be set to avoid this relinking step.
Call Stack (most recent call first):
  cmake/modules/AddLLVM.cmake:1350 (add_llvm_executable)
  cmake/modules/AddLLVM.cmake:1375 (llvm_add_tool)
  tools/llvm-ar/CMakeLists.txt:14 (add_llvm_tool)

... and so on, lots of them...

My solution was to add CMAKE_BUILD_WITH_INSTALL_RPATH=ON to the top CMakeLists.txt, and to patch CrossCompile.cmake to propagate the CMAKE_BUILD_WITH_INSTALL_RPATH variable down to the sub-build:

From ee0b12eaed9adb713793c9e286bbb2d9016ecd9f Mon Sep 17 00:00:00 2001
From: Liviu Ionescu <ilg@livius.net>
Date: Fri, 6 Oct 2023 15:46:21 +0300
Subject: [PATCH] CrossCompile.cmake: propagate CMAKE_BUILD_WITH_INSTALL_RPATH

---
 llvm/cmake/modules/CrossCompile.cmake | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/cmake/modules/CrossCompile.cmake b/llvm/cmake/modules/CrossCompile.cmake
index 6af47b51d4c6..9c607f1a7590 100644
--- a/llvm/cmake/modules/CrossCompile.cmake
+++ b/llvm/cmake/modules/CrossCompile.cmake
@@ -74,6 +74,7 @@ function(llvm_create_cross_target project_name target_name toolchain buildtype)
         -DCMAKE_CXX_COMPILER_LAUNCHER="${CMAKE_CXX_COMPILER_LAUNCHER}"
         ${CROSS_TOOLCHAIN_FLAGS_${target_name}} ${CMAKE_CURRENT_SOURCE_DIR}
         ${CROSS_TOOLCHAIN_FLAGS_${project_name}_${target_name}}
+        -DCMAKE_BUILD_WITH_INSTALL_RPATH="${CMAKE_BUILD_WITH_INSTALL_RPATH}"
         -DLLVM_TARGET_IS_CROSSCOMPILE_HOST=TRUE
         -DLLVM_TARGETS_TO_BUILD="${targets_to_build_arg}"
         -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="${experimental_targets_to_build_arg}"
-- 
2.37.1 (Apple Git-137.1)

Martin, @mstorsjo, what do think about this issue? Should I submit a PR?

mstorsjo commented 1 year ago

Maybe, but I’d be interested in understanding the situation a bit more. Can you explain how to reproduce this situation?

ilg-ul commented 1 year ago

Yes, it would be interesting to understand it, but I currently don't have an explanation. :-(

It did not happen while building previous versions (< 17), and the build scripts are more or less the same. cmake is 3.26.5, the compiler used to build the bootstrap is GCC 13.2.0, installed in a non-system location, and indeed requires -rpath to ensure it uses the correct libraries. The build runs inside an Ubuntu 18.04 docker container.

To be frank, I don't fully understand the checks performed by CMake that trigger this message, so any suggestions are welcome.

ilg-ul commented 1 year ago

the checks performed by CMake

I took a look at the CMake source code, and this message comes from the C++ code, it is not obvious what exactly is checked.

mstorsjo commented 1 year ago

It did not happen while building previous versions (< 17), and the build scripts are more or less the same. cmake is 3.26.5, the compiler used to build the bootstrap is GCC 13.2.0, installed in a non-system location, and indeed requires -rpath to ensure it uses the correct libraries. The build runs inside an Ubuntu 18.04 docker container.

Is it possible to reproduce this in a somewhat minimal docker setup? That would make it trivial for others to look into.

So you're cross compiling llvm for a mingw target, and this happens when cmake configures the nested native build, for the tblgen executables etc? I do this often, so it sounds like something is odd in your setup.

Can you create a dockerfile that reproduces this? Can you use a llvm-mingw release as the mingw cross compiler? I.e. a dockerfile that downloads and unpacks llvm-mingw, sets up PATH environment variables, clones llvm-project and then runs the one single cmake configure that should trigger this issue. Either this should reproduce the issue, or your journey to reduce the issue down to this should find what step in your configuration is odd that might be triggering it.

ilg-ul commented 1 year ago

it sounds like something is odd in your setup

Sure, things are pretty complicated, this is always a possibility.

So you're cross compiling llvm for a mingw target, and this happens when cmake configures the nested native build, for the tblgen executables etc?

That's correct.

I do this often,

My build scripts are heavily inspired by your scripts. The main difference is that I use Ubuntu 18, with a compiled GCC 13 instead of the system GCC, and this GCC 13 is not installed in the system locations, but in a custom location, thus it requires an explicit -rpath when linking the binaries, passed via LDFLAGS.

Can you create a dockerfile that reproduces this?

Probably I can, but it is not a trivial task. It would help to understand the reason why CMake issues this message, and then try to identify a minimal configuration that issues it.

ilg-ul commented 1 year ago

this happens when cmake configures the nested native build, for the tblgen executables etc?

Do you know if there were any changes in the code handling the native builds in 17.x (which fails) vs 16.x (which is fine)?

mstorsjo commented 1 year ago

My build scripts are heavily inspired by your scripts. The main difference is that I use Ubuntu 18, with a compiled GCC 13 instead of the system GCC, and this GCC 13 is not installed in the system locations, but in a custom location, thus it requires an explicit -rpath when linking the binaries, passed via LDFLAGS.

Right, so when using your custom GCC, the built libraries rely on a custom libstdc++, and it needs this rpath option for making the built executables actually usable? I see. (This wasn't really evident from the earlier scattered comments.) I guess that explains the issue mostly.

this happens when cmake configures the nested native build, for the tblgen executables etc?

Do you know if there were any changes in the code handling the native builds in 17.x (which fails) vs 16.x (which is fine)?

Hmm, I don't think so. There were some relating to how one can specify the location of preexisting tblgen executables, but most of that was already before 16.x (and one later change was backported into newer 16.x I think).

Can you create a dockerfile that reproduces this?

Probably I can, but it is not a trivial task. It would help to understand the reason why CMake issues this message, and then try to identify a minimal configuration that issues it.

Yep, clearly. If you have a base image ready and you can trigger this issue fairly quickly (by just running a single cmake configuration command which either succeeds or fails), it should be quite possible to pinpoint the triggering change between 16.x and 17.x as well. (As a docker tip for this; make sure you check out the llvm-project repo fully in one RUN step, which can be reused, then a second RUN step checks out the commit to test, while bisecting. This should let you iterate on this fairly quickly.)

ilg-ul commented 1 year ago

bisecting

So you are suggesting to identify the commit that introduced the issue, and then figure out how to proceed.

I'll consider this, but my configuration is different from yours; the build runs inside the docker container, but with lots of volumes mounted from the host, and multiple explicit docker exec commands using the previously created image. I estimate that the current scripts will require major changes to resume builds from a different commit, and each iteration might take about one hour.

mstorsjo commented 1 year ago

bisecting

So you are suggesting to identify the commit that introduced the issue, and then figure out how to proceed.

I'll consider this, but my configuration is different from yours; the build runs inside the docker container, but with lots of volumes mounted from the host, and multiple explicit docker exec commands using the previously created image. I estimate that the current scripts will require major changes to resume builds from a different commit, and each iteration might take about one hour.

I'm not saying that you should need to do a full build. As it's the cmake command that fails, you can edit the build to exit after the cmake command if it was successful. It shouldn't generally be too hard to modify the build to start from the previous step, then only run the cmake command on top of that, with varying versions of llvm-project checked out, to pinpoint where it changed.

ilg-ul commented 1 year ago

I'm not saying that you should need to do a full build. As it's the cmake command that fails, you can edit the build to exit after the cmake command if it was successful.

Ah, right. I think I can do this.

It shouldn't generally be too hard to modify the build to start from the previous step, then only run the cmake command on top of that

The reason I use host volumes instead of internal container folders is exactly to allow restartable builds. So resuming from the cmake step shouldn't be very difficult, I have to remove some stamp files and folders, and the build resumes from the next step not marked as completed.

llvm / llvm-project

[clang][build] Error: The install of the * target requires changing an RPATH (mingw) #68513