Open ilg-ul opened 1 year ago
Maybe, but I’d be interested in understanding the situation a bit more. Can you explain how to reproduce this situation?
Yes, it would be interesting to understand it, but I currently don't have an explanation. :-(
It did not happen while building previous versions (< 17), and the build scripts are more or less the same. cmake is 3.26.5, the compiler used to build the bootstrap is GCC 13.2.0, installed in a non-system location, and indeed requires -rpath
to ensure it uses the correct libraries. The build runs inside an Ubuntu 18.04 docker container.
To be frank, I don't fully understand the checks performed by CMake that trigger this message, so any suggestions are welcome.
the checks performed by CMake
I took a look at the CMake source code, and this message comes from the C++ code, it is not obvious what exactly is checked.
It did not happen while building previous versions (< 17), and the build scripts are more or less the same. cmake is 3.26.5, the compiler used to build the bootstrap is GCC 13.2.0, installed in a non-system location, and indeed requires
-rpath
to ensure it uses the correct libraries. The build runs inside an Ubuntu 18.04 docker container.
Is it possible to reproduce this in a somewhat minimal docker setup? That would make it trivial for others to look into.
So you're cross compiling llvm for a mingw target, and this happens when cmake configures the nested native build, for the tblgen executables etc? I do this often, so it sounds like something is odd in your setup.
Can you create a dockerfile that reproduces this? Can you use a llvm-mingw release as the mingw cross compiler? I.e. a dockerfile that downloads and unpacks llvm-mingw, sets up PATH environment variables, clones llvm-project and then runs the one single cmake configure that should trigger this issue. Either this should reproduce the issue, or your journey to reduce the issue down to this should find what step in your configuration is odd that might be triggering it.
it sounds like something is odd in your setup
Sure, things are pretty complicated, this is always a possibility.
So you're cross compiling llvm for a mingw target, and this happens when cmake configures the nested native build, for the tblgen executables etc?
That's correct.
I do this often,
My build scripts are heavily inspired by your scripts. The main difference is that I use Ubuntu 18, with a compiled GCC 13 instead of the system GCC, and this GCC 13 is not installed in the system locations, but in a custom location, thus it requires an explicit -rpath
when linking the binaries, passed via LDFLAGS.
Can you create a dockerfile that reproduces this?
Probably I can, but it is not a trivial task. It would help to understand the reason why CMake issues this message, and then try to identify a minimal configuration that issues it.
this happens when cmake configures the nested native build, for the tblgen executables etc?
Do you know if there were any changes in the code handling the native builds in 17.x (which fails) vs 16.x (which is fine)?
My build scripts are heavily inspired by your scripts. The main difference is that I use Ubuntu 18, with a compiled GCC 13 instead of the system GCC, and this GCC 13 is not installed in the system locations, but in a custom location, thus it requires an explicit
-rpath
when linking the binaries, passed via LDFLAGS.
Right, so when using your custom GCC, the built libraries rely on a custom libstdc++, and it needs this rpath option for making the built executables actually usable? I see. (This wasn't really evident from the earlier scattered comments.) I guess that explains the issue mostly.
this happens when cmake configures the nested native build, for the tblgen executables etc?
Do you know if there were any changes in the code handling the native builds in 17.x (which fails) vs 16.x (which is fine)?
Hmm, I don't think so. There were some relating to how one can specify the location of preexisting tblgen executables, but most of that was already before 16.x (and one later change was backported into newer 16.x I think).
Can you create a dockerfile that reproduces this?
Probably I can, but it is not a trivial task. It would help to understand the reason why CMake issues this message, and then try to identify a minimal configuration that issues it.
Yep, clearly. If you have a base image ready and you can trigger this issue fairly quickly (by just running a single cmake configuration command which either succeeds or fails), it should be quite possible to pinpoint the triggering change between 16.x and 17.x as well. (As a docker tip for this; make sure you check out the llvm-project repo fully in one RUN step, which can be reused, then a second RUN step checks out the commit to test, while bisecting. This should let you iterate on this fairly quickly.)
bisecting
So you are suggesting to identify the commit that introduced the issue, and then figure out how to proceed.
I'll consider this, but my configuration is different from yours; the build runs inside the docker container, but with lots of volumes mounted from the host, and multiple explicit docker exec
commands using the previously created image. I estimate that the current scripts will require major changes to resume builds from a different commit, and each iteration might take about one hour.
bisecting
So you are suggesting to identify the commit that introduced the issue, and then figure out how to proceed.
I'll consider this, but my configuration is different from yours; the build runs inside the docker container, but with lots of volumes mounted from the host, and multiple explicit
docker exec
commands using the previously created image. I estimate that the current scripts will require major changes to resume builds from a different commit, and each iteration might take about one hour.
I'm not saying that you should need to do a full build. As it's the cmake command that fails, you can edit the build to exit after the cmake command if it was successful. It shouldn't generally be too hard to modify the build to start from the previous step, then only run the cmake command on top of that, with varying versions of llvm-project checked out, to pinpoint where it changed.
I'm not saying that you should need to do a full build. As it's the cmake command that fails, you can edit the build to exit after the cmake command if it was successful.
Ah, right. I think I can do this.
It shouldn't generally be too hard to modify the build to start from the previous step, then only run the cmake command on top of that
The reason I use host volumes instead of internal container folders is exactly to allow restartable builds. So resuming from the cmake step shouldn't be very difficult, I have to remove some stamp files and folders, and the build resumes from the next step not marked as completed.
While compiling 17.0.2 for Windows using mingw-w64 and a bootstrap compiler, I got the following error while configuring the native sub-build:
My solution was to add
CMAKE_BUILD_WITH_INSTALL_RPATH=ON
to the topCMakeLists.txt
, and to patchCrossCompile.cmake
to propagate theCMAKE_BUILD_WITH_INSTALL_RPATH
variable down to the sub-build:Martin, @mstorsjo, what do think about this issue? Should I submit a PR?