key4hep / key4hep-spack

A Spack recipe repository of Key4hep software.
10 stars 24 forks source link

libonnxruntime.so.1.17.1: cannot open shared object file: No such file or directory #655

Closed giovannimarchiori closed 1 month ago

giovannimarchiori commented 1 month ago

I am running in the nightlies on an alma9 system. I use them regularly on a daily basis successfully, with some local versions of a few packages of my forks which are kept up-to-date with the upstream.

Today I am encountering the following problem: if I use the nightly out of the box and run k4run with a script that uses algorithms in k4RecCalorimeter everything runs fine. However, if I checkout and recompile k4RecCalorimeter, and then run, I get many WARNINGS like the following:

WARNING: cannot load libk4RecCalorimeterPlugins.so for factory CreateEmptyCaloCellsCollection
WARNING: libfastjet.so.0: cannot open shared object file: No such file or directory
WARNING: cannot load libk4RecFCCeeCalorimeterPlugins.so for factory CreateCaloClustersSlidingWindowFCCee
WARNING: libonnxruntime.so.1.17.1: cannot open shared object file: No such file or directory

I noticed the following:

ldd /cvmfs/sw-nightlies.hsf.org/key4hep/releases/2024-09-25/x86_64-almalinux9-gcc14.2.0-opt/k4reccalorimeter/5e8879eb6088d7e040d2df843e0ef75ed927b1c2_develop-gax3w6/lib/libk4RecFCCeeCalorimeterPlugins.so | grep onnx
    libonnxruntime.so.1.17.1 => /cvmfs/sw-nightlies.hsf.org/key4hep/releases/2024-09-21/x86_64-almalinux9-gcc14.2.0-opt/py-onnxruntime/1.17.1-hce36h/lib/libonnxruntime.so.1.17.1 (0x00007fe20a600000)

while

ldd /home/gmarchio/work/fcc/allegro/nightly/k4RecCalorimeter/install/lib64/libk4RecFCCeeCalorimeterPlugins.so | grep onnx
    libonnxruntime.so.1.17.1 => not found

Did anything change maybe at cmake central level that would cause this behaviour?

Tagging @jmcarcell

jmcarcell commented 1 month ago

While this is properly fixed, the workaround is

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cvmfs/sw-nightlies.hsf.org/key4hep/releases/2024-09-21/x86_64-almalinux9-gcc14.2.0-opt/py-onnxruntime/1.17.1-hce36h/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cvmfs/sw-nightlies.hsf.org/key4hep/releases/2024-09-21/x86_64-almalinux9-gcc14.2.0-opt/fastjet/3.4.2-b5k732/lib
jmcarcell commented 1 month ago

I tried with today's nightlies but I can't reproduce, can you tell me which commands you are running?

giovannimarchiori commented 1 month ago

Hi @jmcarcell ,

from a clean directory, if I just do source /cvmfs/sw-nightlies.hsf.org/key4hep/setup.sh then I have no problem.

However, if I have k4RecCalorimeter installed and I do

cd k4RecCalorimeter
k4_local_repo
mkdir build
cd build
cmake ../ -DCMAKE_INSTALL_PREFIX=../install
make -j32 install
cd ../..

then I have these errors, unless I set LD_LIBRAY_PATH explicitly as you suggested.

tmadlener commented 1 month ago

This might be related to how RedHat configures the default linker flags. We (at DESY) have had similar issues in the past with locally built libraries, but only on RedHat systems, on Ubuntu everything was working as expected.

We ended up putting something like the following snippet into a package setup script, that adds the path, after first dynamically figuring out the current onnxruntime directory.

ONNXRUNTIME_PATH=$(dirname $(python -c 'import onnxruntime; print(f"{onnxruntime.__file__}")'))
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$(realpath ${ONNXRUNTIME_PATH}/../../../../)/lib64  # CentOS7
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$(realpath ${ONNXRUNTIME_PATH}/../../../../)/lib    # Ubuntu
jmcarcell commented 1 month ago

Can you say a bit more about the system where you are running this @giovannimarchiori? If it's RedHat as Thomas says or plain Alma 9? You can check with cat /etc/os-release.

giovannimarchiori commented 1 month ago

It's a rocky linux 9

giovannimarchiori commented 1 month ago

BTW when setting up the environment I get

Warning: The default compiler for AlmaLinux 9 has changed to GCC 14. A new -c flag can be used to select the compiler, to go back to the system compiler use '-c gcc11'

jmcarcell commented 1 month ago

Yes, this is expected, now GCC 14 is available in the nightlies.

andresailer commented 1 month ago

Hi @giovannimarchiori

Could you run

make clean
make VERBOSE=1

So that we can maybe see the linker flags that you are getting when building k4RecCalorimeter

giovannimarchiori commented 1 month ago

Hi @andresailer

I put the output of make VERBOSE=1 here: https://cernbox.cern.ch/s/VBB7q2gnUixiP8H

I checked that the problem is still there with today's nightly I also checked if switching to gcc11 would make a change, but I get the exact same issue also when setting up the key4hep stack for gcc11

BTW I've had also issues with other libraries not being picked up correctly by other things e.g. for ncurses and intel-tbb, now I need to add them to LD_LIBRARY_PATH myself. I didn't have these problems when running before the summer holidays - but maybe it comes from some automatic update on the machine I am using

andresailer commented 1 month ago

Hi @giovannimarchiori Thanks! Nothing suspicious.

Can you try with https://github.com/HEP-FCC/k4RecCalorimeter/pull/117 ?

giovannimarchiori commented 1 month ago

Hi @andresailer With https://github.com/HEP-FCC/k4RecCalorimeter/pull/117 the warnings disappear