root-project / cling

The cling C++ interpreter
Other
3.53k stars 269 forks source link

'__emutls_get_address' unresolved while linking #370

Open matteosecli opened 4 years ago

matteosecli commented 4 years ago

Hello, first of all thanks for this awesome project! 🙂

I'm opening this bug report as I didn't find an already open one about this issue; sorry in advance if it's a duplicate.

I'm on MacOS 10.14.6 and I'm experiencing the same issue described in https://github.com/jupyter-xeus/xeus-cling/issues/161: typing

thread_local int x = 42;

gives

IncrementalExecutor::executeFunction: symbol '__emutls_get_address' unresolved while linking [cling interface function]!

I've first tried to follow some suggestions and upgrade the llvm base; I've built a 0.8~dev copy of Cling from what it seems to be a tree that's been upgraded to LLVM 9 (as I've read in the mailing list, the cling-specific patches are being rebased on LLVM 9):

git clone --depth 1 --branch upgrade_llvm90 https://github.com/vgvassilev/llvm.git src
cd src/tools/
git clone --depth 1 --branch upgrade_llvm90 https://github.com/vgvassilev/clang.git 
git clone --depth 1 --branch upgrade_llvm90 https://github.com/vgvassilev/cling.git
cd ../..
mkdir installprefix
mkdir build
cd build
cmake ../src -DCMAKE_INSTALL_PREFIX=/Users/matteo/GitHub/cling/installprefix/ -DCLING_CXX_PATH=clang++
make -j8

With this dev build, seemingly related issues like https://github.com/root-project/cling/issues/321 are finally gone; however, I still get the error above about the unresolved __emutls_get_address symbol (I've also tried with the latest official MacOS dev build here, btw, and it has the same problem).

I've found another seemingly related issue in a different project based on LLVM/Clang: https://github.com/mull-project/mull/issues/743. They propose a solution similar to this comment in the original bug report I've linked, implemented in this PR: https://github.com/mull-project/mull/pull/745. The idea is just to take the emutls.c file from the LLVM compiler-rt builtins and compile it into a dynamic library. I've just cloned the proposed PR, changed the relevant source files to the ones in the 9.x branch of the LLVM project (https://github.com/llvm/llvm-project/blob/release/9.x/compiler-rt/lib/builtins/) just to be consistent, and produced a libcling-tls.dylib with the provided cmake file (I've just changed the name of the resulting library).

If I link this library, the error in the original post seems to be gone (cling welcome header removed for shortness):

$ cling --std=c++14 -I/usr/local/include -l/Users/matteo/emutls/build/libcling-tls.dylib
[cling]$ thread_local int x = 42;
[cling]$ x
(int) 42
[cling]$ .q
$

I was getting the very same error also when trying to use a more complex example from the xtensor library, which at a certain point uses thread_local in this included file: https://github.com/xtensor-stack/xtensor/blob/1cc6682a6fbb07c9a80b1726e4edbbef862b9b88/include/xtensor/xstrided_view_base.hpp.

Without the libcling-tls library:

$ cling --std=c++14 -I/usr/local/include
[cling]$ #include <iostream>
[cling]$ 
[cling]$ #include "xtensor/xarray.hpp"
[cling]$ #include "xtensor/xio.hpp"
[cling]$ #include "xtensor/xview.hpp"
[cling]$ 
[cling]$ xt::xarray<double> arr1 {{1.0, 2.0, 3.0}, {2.0, 5.0, 7.0}, {2.0, 5.0, 7.0}};
[cling]$ xt::xarray<double> arr2 {5.0, 6.0, 7.0};
[cling]$ 
[cling]$ std::cout << xt::view(arr1, 1) + arr2 << std::endl;
IncrementalExecutor::executeFunction: symbol '__emutls_get_address' unresolved while linking [cling interface function]!
[cling]$ .q
$

With the libcling-tls library:

$ cling --std=c++14 -I/usr/local/include -l/Users/matteo/emutls/build/libcling-tls.dylib
[cling]$ #include <iostream>
[cling]$ 
[cling]$ #include "xtensor/xarray.hpp"
[cling]$ #include "xtensor/xio.hpp"
[cling]$ #include "xtensor/xview.hpp"
[cling]$ 
[cling]$ xt::xarray<double> arr1 {{1.0, 2.0, 3.0}, {2.0, 5.0, 7.0}, {2.0, 5.0, 7.0}};
[cling]$ xt::xarray<double> arr2 {5.0, 6.0, 7.0};
[cling]$ 
[cling]$ std::cout << xt::view(arr1, 1) + arr2 << std::endl;
{  7.,  11.,  14.}
[cling]$ .q
0  cling                    0x000000010670602e llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  cling                    0x0000000106706420 SignalHandler(int) + 180
2  libsystem_platform.dylib 0x00007fff5a1c0b5d _sigtramp + 29
3  libsystem_platform.dylib 0x000000010b1e5e00 _sigtramp + 18446603343485883072
4  libsystem_c.dylib        0x00007fff5a07b6ac exit + 48
5  libdyld.dylib            0x00007fff59fd53dc start + 8
Segmentation fault: 11
$

so the code seems to give the proper output but cling crashes when it exits (btw, it doesn't happen if I use something like exit(0) instead of .q).

If instead I compile the same code (provided with a main()) using the manually-compiled clang from the LLVM 9 branch bundled with cling, the code executes and exits correctly without the need of the extra libcling-tls library.


Since the libcling-tls.dylib trick seems to me a not-totally-legit hack anyway, I kindly ask you if it's possible to fix this bug in Cling itself, without having to compile and link some specially crafted library for the job. It that's somehow not possible, do you know how can I avoid the segmentation fault I'm seeing with this trick?

Wish I could help more, but I'm really not familiar with these topics...😅

Thank you in advance for your help!

SimeonEhrig commented 3 years ago

For my issue #321, I also tried the new LLVM 9 base, and the problem still exists. Unfortunately t's some time ago since I tested it and I forgot to write it down, but if I remember correctly, besides upgrading to a new LLVM version, upgrading to ORCJITv2 is necessary to solve the problem. But I am not 100% sure if I remember correctly.

matteosecli commented 3 years ago

Hi @SimeonEhrig, thanks for the insight! I could reproduce your issue with Cling v0.7, but if I use the code in the upgrade_llvm90 branches of https://github.com/vgvassilev/[llvm|clang|cling].git then at least that issue seems to be solved. So maybe, apart from the upgrade to the LLVM 9 base, some work has been done to upgrade to ORCJITv2 as well (I honestly don't know enough about this to go through the code changes).

SimeonEhrig commented 3 years ago

Hi @SimeonEhrig, thanks for the insight! I could reproduce your issue with Cling v0.7, but if I use the code in the upgrade_llvm90 branches of https://github.com/vgvassilev/[llvm|clang|cling].git then at least that issue seems to be solved. So maybe, apart from the upgrade to the LLVM 9 base, some work has been done to upgrade to ORCJITv2 as well (I honestly don't know enough about this to go through the code changes).

Thanks for the hint. You are right. I don't know what I did wrong last time, but with LLVM 9 update my std::future problem is solved. I have a little insight into Vassil's development and know that LLVM 9 will not change the ORCJIT API. But that is one of the next tasks, because ORCJITv1 is deprecated.

matteosecli commented 3 years ago

Glad to hear that at least the std::future issue is solved! 🙂

SimeonEhrig commented 3 years ago

@matteosecli I made a mistake when testing the std::future problem. I used Clang 9 to build Cling and did not realize that the compiler linked cling with libc++, which is also provided by llvm. This fixed the problem. But if you use the libstdc++, for example if you compile with g++, the problem still exists.

But maybe I can fix it with the LLVM 9 upgrade I mentioned earlier in my issue.

matteosecli commented 3 years ago

@SimeonEhrig sorry for this super late response; thank you for the extra info!

Axel-Naumann commented 3 years ago

So is this fixed? Can we close it?

matteosecli commented 3 years ago

It wasn't last time I tried, but I see that in the meantime several commits have been pushed. Should I try again by compiling the latest committed version or some other branch?

SimeonEhrig commented 3 years ago

So is this fixed? Can we close it?

Only if glib is also working. Alternatively, we drop the support for the glib, but I think this is not an option ;-)

Axel-Naumann commented 3 years ago

I get (ROOT master, Mac11):

root [0] thread_local int TLVAR;
Assertion failed: (!llvm::StringRef(mangled_name).startswith("__") && "Already added!"), function LazyFunctionCreatorAutoloadForModule, file /Users/axel/build/root/src/core/metacling/src/TCling.cxx, line 6485.

So whatever it is (this might be new sice llvm9) it's not yet solved. We need to check how lli does with OrcJITv1 on macOS. Any volunteer?

SimeonEhrig commented 3 years ago

Does MacOS use glib? It's really glib specific, because glib use other names than the libc++ from LLVM for thread local memory. Before LLVM 9, the translation of the names was not done for the JIT and I believe, with LLVM 9, it does not work out of the box.

It's on my todo list, but the priority is not to high.

matteosecli commented 3 years ago

@Axel-Naumann I've just tried with the latest nightly ROOT build for MacOS 10.14.6 and I still get the error documented in my OP; I have no clues on why it's different from the one you get. I also have no idea of what lli is, but I'm available to test whatever needed on my MacOS machine or to provide any additional info on the errors I'm seeing.

@SimeonEhrig I'm not totally sure that the problem with thread_local variables is still due to glib. This is how cling is linked on my MacOS 10.14.6 machine:

/usr/local/bin/cling:
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
    /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)

So it's indeed using libc++; and while the problem with std::future reported in https://github.com/root-project/cling/issues/321, as you noted, is gone, the problem with thread_local is still there. 🤔

SimeonEhrig commented 3 years ago

Maybe, I was a little bit lazy with the term glib. I mean the lib libstdc++, which is used by GCC.

But nevertheless, I also tried on my machine (Ubuntu 20.04) and I'm not affected of the bug.

$ ~/workspace/cling9demo/install/bin/cling --std=c++14

****************** CLING ******************
* Type C++ code and press enter to run it *
*             Type .q to exit             *
*******************************************
[cling]$ thread_local int x = 42;
[cling]$ #include <iostream>
[cling]$ #include <future>
[cling]$ int foo(){
[cling]$ ?   std::future<int> f1 = std::async([](){ return 42; });
[cling]$ ?   f1.wait();
[cling]$ ?   return f1.get();
[cling]$ ?   }
[cling]$ int i = foo();
[cling]$ i
(int) 42
[cling]$ 

But my linkage is strange:

$ ldd ~/workspace/cling9demo/install/bin/cling
    linux-vdso.so.1 (0x00007ffcdeed2000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe5acd0b000)
    libz.so.1 => /opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/zlib-1.2.11-i4ll3kcb3mjj6ghhmc6moicp3xoii262/lib/libz.so.1 (0x00007fe5accf1000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe5acce6000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe5acce0000)
    libtinfo.so.6 => /opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/ncurses-6.2-vule4kohlsulck3paawuxncu2yqnjorw/lib/libtinfo.so.6 (0x00007fe5acca1000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe5acb52000)
    libz3.so => /opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/z3-4.8.7-62bvt24euf2x2ay3bq2llcm4q77cfdf2/lib/libz3.so (0x00007fe5ab638000)
    libc++.so.1 => /opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/llvm-11.0.0-ubn363okfkmwy6dpoozlbgidl4bks3wd/lib/libc++.so.1 (0x00007fe5ab55c000)
    libc++abi.so.1 => /opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-10.2.0/llvm-11.0.0-ubn363okfkmwy6dpoozlbgidl4bks3wd/lib/libc++abi.so.1 (0x00007fe5ab522000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe5ab505000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe5ab313000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fe5acd41000)
    libstdc++.so.6 => /opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-9.3.0/gcc-10.2.0-nd7xsa2anuya7be46rjsaxbiobtgguod/lib64/libstdc++.so.6 (0x00007fe5ab13c000)
    libatomic.so.1 => /opt/spack-modules/linux-ubuntu20.04-skylake_avx512/gcc-9.3.0/gcc-10.2.0-nd7xsa2anuya7be46rjsaxbiobtgguod/lib64/libatomic.so.1 (0x00007fe5ab131000)

It links against libc++ and libstdc++ so I can't say, if it is really MacOS related.

matteosecli commented 3 years ago

@SimeonEhrig that's extremely interesting! I've tried to reproduce a linkage that looks like yours on my MacOS machine, by injecting libstdc++ into cling via DYLD_INSERT_LIBRARIES (the equivalent of LD_PRELOAD on MacOS):

$ DYLD_INSERT_LIBRARIES=/usr/local/opt/gcc\@10/lib/gcc/10/libstdc++.dylib cling

****************** CLING ******************
* Type C++ code and press enter to run it *
*             Type .q to exit             *
*******************************************
[cling]$ thread_local int x = 42;
[cling]$ x
(int) 42
[cling]$ 
[cling]$ #include <iostream>
[cling]$ #include <future>
[cling]$ int foo(){
[cling]$ ?   std::future<int> f1 = std::async([](){ return 42; });
[cling]$ ?   f1.wait();
[cling]$ ?   return f1.get();
[cling]$ ?   }
[cling]$ int i = foo();
[cling]$ i
(int) 42
[cling]$ .q

So, adding libstdc++ seems to make it work! Btw, gcc 10 is installed via Homebrew.

Hope it's a useful hint towards a proper solution!

SimeonEhrig commented 3 years ago

@SimeonEhrig that's extremely interesting! I've tried to reproduce a linkage that looks like yours on my MacOS machine, by injecting libstdc++ into cling via DYLD_INSERT_LIBRARIES (the equivalent of LD_PRELOAD on MacOS):

$ DYLD_INSERT_LIBRARIES=/usr/local/opt/gcc\@10/lib/gcc/10/libstdc++.dylib cling

****************** CLING ******************
* Type C++ code and press enter to run it *
*             Type .q to exit             *
*******************************************
[cling]$ thread_local int x = 42;
[cling]$ x
(int) 42
[cling]$ 
[cling]$ #include <iostream>
[cling]$ #include <future>
[cling]$ int foo(){
[cling]$ ?   std::future<int> f1 = std::async([](){ return 42; });
[cling]$ ?   f1.wait();
[cling]$ ?   return f1.get();
[cling]$ ?   }
[cling]$ int i = foo();
[cling]$ i
(int) 42
[cling]$ .q

So, adding libstdc++ seems to make it work! Btw, gcc 10 is installed via Homebrew.

Hope it's a useful hint towards a proper solution!

Can you please post the output of ldd or the equivalent MacOS tool with the DYLD_INSERT_LIBRARIES environment variable again.

matteosecli commented 3 years ago

Can you please post the output of ldd or the equivalent MacOS tool with the DYLD_INSERT_LIBRARIES environment variable again.

AFAIK, otool -L (equivalent of ldd on MacOS) does not list the libraries injected via DYLD_INSERT_LIBRARIES, as they are loaded at runtime. I've tried to do an export DYLD_INSERT_LIBRARIES=..., but that doesn't change the result.

However, the loaded libraries should just be the ones I listed earlier plus the one specified in DYLD_INSERT_LIBRARIES. Another way to achieve the same result is to load the library directly from cling:

$ cling

****************** CLING ******************
* Type C++ code and press enter to run it *
*             Type .q to exit             *
*******************************************
[cling]$ thread_local int x = 42
IncrementalExecutor::executeFunction: symbol '__emutls_get_address' unresolved while linking [cling interface function]!
[cling]$ .L /usr/local/opt/gcc@10/lib/gcc/10/libstdc++.dylib
[cling]$ thread_local int y = 42
(int) 42
[cling]$ .q
SimeonEhrig commented 3 years ago

Okay, looks like, that are our problems are similar, but not equals. But I'm really interested, what's happens inside Cling if both libraries are linked in the same time. I thought the causes an error, but it solves an error. That's really surprising.