pdidev / pdi

The PDI Data Interface
https://pdi.dev
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

Randomly reproducible plugin loading issue #413

Closed jbigot closed 2 months ago

jbigot commented 2 years ago

Three tests regularly but randomly fail in the runs:

The following tests FAILED:
    951 - PDI_example_trace_C (Failed)
    952 - PDI_example_trace_F (Failed)
    953 - PDI_example_trace_P (Failed)

With the following error:

  * unable to load `/tmp/pdi_plugins/libpdi_trace_plugin.so' /tmp/pdi_plugins/libpdi_trace_plugin.so: undefined symbol: _ZN3fmt2v612format_errorD1Ev, 
  * unable to load `/tmp/pdi_plugins/libpdi_trace_plugin.so' /tmp/pdi_plugins/libpdi_trace_plugin.so: undefined symbol: _ZN3fmt2v612format_errorD1Ev, 

This is a kind of problem we also see regularly elsewhere.

jbigot commented 2 years ago

Happened for https://gitlab.maisondelasimulation.fr/pdidev/pdi/-/jobs/47098

Runner: #40 (naVcPR7u) gitlab-ci-mdls1

Running with gitlab-runner 12.10.3 (2910048c)
  on gitlab-ci-mdls1 naVcPR7u
section_start:1645719564:prepare_executor
Preparing the "docker" executor
Using Docker executor with image ghcr.io/pdidev/spack/latest/gcc/openmpi/all:v1 ...
Pulling docker image ghcr.io/pdidev/spack/latest/gcc/openmpi/all:v1 ...
Using docker image sha256:a1e6c6be9b4daf35f52b949f66777cca865c1e051c0c9cb3358daaff65ea70e3 for ghcr.io/pdidev/spack/latest/gcc/openmpi/all:v1 ...
section_end:1645719566:prepare_executor
section_start:1645719566:prepare_script
Preparing environment
Running on runner-navcpr7u-project-50-concurrent-1 via gitlab-ci-mdls1...
section_end:1645719569:prepare_script
section_start:1645719569:get_sources
Getting source from Git repository
Fetching changes...
Reinitialized existing Git repository in /builds/pdidev/pdi/.git/
From https://gitlab.maisondelasimulation.fr/pdidev/pdi
 * [new ref]         eb0976658cfb8e87ab283b9b70bbfda22371dbcd -> refs/pipelines/12721
Checking out eb097665 as refs/merge-requests/422/head...
Removing tools/pdicfg_validator/pdicfg_validator/__pycache__/

Skipping Git submodules setup
section_end:1645719576:get_sources
section_start:1645719576:restore_cache
Restoring cache
section_end:1645719589:restore_cache
section_start:1645719589:download_artifacts
Downloading artifacts
section_end:1645719599:download_artifacts
section_start:1645719599:build_script
Running before_script and script
$ export PDI_PLUGIN_PATH=/tmp/pdi_plugins
$ export DOCKER_RUNNER=gitlab
$ export MAKEFLAGS='-j 2'
$ export CTEST_FLAGS="--output-junit ${PWD}/tests.xml"
$ export TEST_DIR=/tmp
$ bash tools/build_scripts/build_and_run.sh
jbigot commented 2 years ago

A rerun that passes: https://gitlab.maisondelasimulation.fr/pdidev/pdi/-/jobs/47108

Runner: #40 (naVcPR7u) gitlab-ci-mdls1

Running with gitlab-runner 12.10.3 (2910048c)
  on gitlab-ci-mdls1 naVcPR7u
section_start:1645723477:prepare_executor
Preparing the "docker" executor
Using Docker executor with image ghcr.io/pdidev/spack/latest/gcc/openmpi/all:v1 ...
Pulling docker image ghcr.io/pdidev/spack/latest/gcc/openmpi/all:v1 ...
Using docker image sha256:a1e6c6be9b4daf35f52b949f66777cca865c1e051c0c9cb3358daaff65ea70e3 for ghcr.io/pdidev/spack/latest/gcc/openmpi/all:v1 ...
section_end:1645723478:prepare_executor
section_start:1645723478:prepare_script
Preparing environment
Running on runner-navcpr7u-project-50-concurrent-0 via gitlab-ci-mdls1...
section_end:1645723480:prepare_script
section_start:1645723480:get_sources
Getting source from Git repository
Fetching changes...
Reinitialized existing Git repository in /builds/pdidev/pdi/.git/
Checking out eb097665 as refs/merge-requests/422/head...
Removing CMakeCache.txt
Removing CMakeDoxyfile.in
Removing CMakeDoxygenDefaults.cmake
Removing CMakeFiles/
Removing Makefile
Removing PDIConfig.cmake
Removing PDIConfigVersion.cmake
Removing cmake_install.cmake
Removing config.h
Removing docs/
Removing env.bash
Removing fmoddir/
Removing pdi/export.h
Removing pdi/version.h
Removing pdirun_intree
Removing public/
Removing zppconf/

Skipping Git submodules setup
section_end:1645723483:get_sources
section_start:1645723483:restore_cache
Restoring cache
section_end:1645723485:restore_cache
section_start:1645723485:download_artifacts
Downloading artifacts
section_end:1645723487:download_artifacts
section_start:1645723487:build_script
Running before_script and script
$ export PDI_PLUGIN_PATH=/tmp/pdi_plugins
$ export DOCKER_RUNNER=gitlab
$ export MAKEFLAGS='-j 2'
$ export CTEST_FLAGS="--output-junit ${PWD}/tests.xml"
$ export TEST_DIR=/tmp
$ bash tools/build_scripts/build_and_run.sh
jbigot commented 2 years ago

now also in ubuntu env:

This might be an issue related to the call to find spdlog in PDIConfig.cmake not behaving correctly because a previous call was already made in the superproject.

jbigot commented 2 years ago

@youldrouis mentions that the same issue appears with spack-installed PDI

jbigot commented 2 years ago

In GitLab by @youldrouis on Mar 10, 2022, 12:24

The error I get when linking the spack installed PDI package from the GYSELA compile is similar indeed

symbol lookup error: /ccc/work/cont003/gen2224/gen2224/spack/environments/gysela-deisa-irene-skl/.spack-env/view/lib64/libpdi.so.1: undefined symbol: _ZTVN3fmt2v612format_errorE

it seemed like a link error to the fmt lib according to Karol, bud ldd gives no link to fmt.

The random error was solved by uninstalling and reinstalling the PDI package only (not the pdi plugins) in the same environment.

more details :

ouldrouy@irene190[Thu Mar 10 12:03 PM]/ccc/work/cont003/gen2224/gen2224{1}>ldd /ccc/work/cont003/gen2224/gen2224/spack/var/spack/environments/gysela-deisa-irene-skl/.spack-env/view/lib64/libpdi.so.1.4.3 
    linux-vdso.so.1 =>  (0x00007ffe965cc000)
    libparaconf.so.0 => /ccc/work/cont003/gen2224/gen2224/spack/opt/spack/linux-rhel7-skylake_avx512/gcc-9.3.0/paraconf-0.4.15-7cbmw3eh4vlmybhvdcecr7srv7l25jy6/lib64/libparaconf.so.0 (0x00002b998ffd4000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00002b9990109000)
    libspdlog.so.1 => /ccc/work/cont003/gen2224/gen2224/spack/opt/spack/linux-rhel7-skylake_avx512/gcc-9.3.0/spdlog-1.9.2-ik2paellcc6dhlb7iwxeaktjh73s2p5x/lib64/libspdlog.so.1 (0x00002b998fff3000)
    libyaml.so => /ccc/work/cont003/gen2224/gen2224/spack/opt/spack/linux-rhel7-skylake_avx512/gcc-9.3.0/libyaml-0.2.5-yce4xrvn7kc3htlutrgkwqy7mjccytmc/lib/libyaml.so (0x00002b9990082000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b999030d000)
    libstdc++.so.6 => /ccc/products2/gcc-9.3.0/Atos_7__x86_64/system/default/lib64/libstdc++.so.6 (0x00002b9990529000)
    libm.so.6 => /lib64/libm.so.6 (0x00002b9990710000)
    libgcc_s.so.1 => /ccc/products2/gcc-9.3.0/Atos_7__x86_64/system/default/lib64/libgcc_s.so.1 (0x00002b99900a5000)
    libc.so.6 => /lib64/libc.so.6 (0x00002b9990a12000)
    /lib64/ld-linux-x86-64.so.2 (0x00002b998fee5000)

The spack environment :

-- linux-rhel7-skylake_avx512 / gcc@9.3.0 -----------------------
binutils@2.37         libxml2@2.9.12             pdiplugin-trace@1.4.3      py-joblib@1.0.1            py-pyparsing@3.0.6         py-xarray@0.18.2
bzip2@1.0.8           libyaml@0.2.5              pdiplugin-user-code@1.4.3  py-llvmlite@0.34.0         py-python-dateutil@2.8.2   py-zict@1.0.0
cmake@3.22.1          llvm@10.0.1                pkgconf@1.8.0              py-locket@0.2.0            py-pythran@0.9.12          python@3.9.9
expat@2.4.3           ncurses@6.2                py-beniget@0.4.1           py-msgpack@1.0.2           py-pytz@2021.1             readline@8.1
gdbm@1.19             netlib-lapack@3.9.1        py-bottleneck@1.3.2        py-multipledispatch@0.6.0  py-pyyaml@6.0              spdlog@1.9.2
gettext@0.21          openmpi@4.0.3              py-click@8.0.3             py-numba@0.51.1            py-scikit-learn@1.0.2      sqlite@3.37.1
hdf5@1.12.1           openssl@1.1.1m             py-cloudpickle@1.6.0       py-numexpr@2.7.3           py-scipy@1.7.3             swig@4.0.2
hwloc@2.7.0           paraconf@0.4.15            py-dask@2021.11.2          py-numpy@1.22.1            py-setuptools@57.4.0       tar@1.34
libbsd@0.11.3         pcre@8.44                  py-dask-glm@0.2.0          py-packaging@21.3          py-six@1.16.0              util-linux-uuid@2.36.2
libedit@3.1-20210216  pdi@1.4.3                  py-dask-ml@2022.1.22       py-pandas@1.3.5            py-sortedcontainers@2.1.0  xz@5.2.5
libffi@3.3            pdiplugin-decl-hdf5@1.4.3  py-distributed@deisa       py-partd@1.1.0             py-tblib@1.6.0             zlib@1.2.11
libiconv@1.16         pdiplugin-deisa@develop    py-fsspec@2021.7.0         py-ply@3.11                py-threadpoolctl@3.0.0     zpp@1.0.16
libmd@1.0.3           pdiplugin-mpi@1.4.3        py-gast@0.5.3              py-psutil@5.8.0            py-toolz@0.9.0
libpciaccess@0.16     pdiplugin-pycall@1.4.3     py-heapdict@1.0.1          py-pybind11@2.7.1          py-tornado@6.1
jbigot commented 2 years ago

mentioned in commit 049f5a2e5d722fd93542940fb6d2444b1823db26

jbigot commented 1 year ago

mentioned in commit 778453625c56e3e4b566515cf7b1903da20e5fa0

jbigot commented 1 year ago

mentioned in commit c3c1ad2cf5baeee29de4c3726f90ecd62c6d578c

jbigot commented 1 year ago

mentioned in commit 8c63a16e9701527b4d50d9efe942c5b85716aeb8

jbigot commented 1 year ago

mentioned in commit c92c9ed98f4044b81e77fb6fe8949e75cf697c45

jbigot commented 1 year ago

mentioned in commit ac06e1c1cd413695a3bc7f152c68cec54f2c1887