Open drew-parsons opened 4 months ago
Unfortunately, this is probably the same issue as #4103 . ADIOS is looking for libfabric before EVPath does, which messes with the CMake in EVPath and produces undesirable results. In #4103, the solution was to simply not have libfabric installed. Neither EVPath nor ADIOS need or uses it unless you are on a machine with an RDMA transport. Alternatively, you can run cmake with -DCMAKE_DISABLE_FIND_PACKAGE_LIBFABRIC=TRUE, or -DADIOS2_USE_SST=FALSE (if you don't need the SST streaming transport). This problem should be fixed in master branch, and so will be fixed in 2.10.1 when that release happens.
Thanks. One of the design goals for the debian packages is to support clusters (cloud computing), so RDMA (and fbric) support is wanted. I'll look forward to the new release, and try the workarounds or patches in the meantime.
-DADIOS2_USE_SST=FALSE
does not fix the problem. It actually gives even more missing symbols
[212/308] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0 && :
FAILED: bin/adios2_reorganize.serial
: && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0 && :
/usr/bin/ld: warning: libadios2_serial_perfstubs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_ffs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `register_data_format'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSencode'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `format_list_of_FMFormat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_initialize_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_fixed_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_FFSContext_FM'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FMfield_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FFSContext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_timer_stop_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `name_of_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `perfstubs_initialized'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_make_timer_name_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `get_server_ID_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMfree_struct_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMContext_from_FFS'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `get_server_rep_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMfree_var_rec_elements'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSTypeHandle_from_encode'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `load_external_format_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_timer_create_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFS_est_decode_length'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSdecode_in_place_possible'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_timer_start_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMformat_from_ID'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `establish_conversion'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_local_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSdecode_to_buffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMdump_data'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMcopy_struct_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMFormat_of_original'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFShas_conversion'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSdecode_in_place'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMstruct_size_field_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_finalize_'
collect2: error: ld returned 1 exit status
Same with both -DADIOS2_USE_SST=FALSE
and -DCMAKE_DISABLE_FIND_PACKAGE_LIBFABRIC=TRUE
Not sure what's going on with the workarounds, but you might try github master.
Libfabric can installed on non-RDMA machines of course, and will configure itself to offer only the sockets-based provider which is theoretically useful for testing libfabric code. In practice however it's not useful (for complicated reasons), and libfabric code like that in ADIOS is quite likely not to work if there's not a real RDMA provider underneath libfabric. Certainly if it does work, performance would be terrible, which is why we don't generally know if ADIOS would build in this circumstance, we don't do it. Instead, ADIOS takes the presence of certain libraries as indicative of the sort of machine that it's on and builds features accordingly. Unfortunately this isn't the sort of dependency that packaging systems seem to capture well.
Thanks. I'll report again if I can confirm that the problem is lifted (or not) in a later code base.
I'm still getting undefined references with 2.10.1, with -DEVPATH_TRANSPORT_MODULES=ON -DADIOS2_USE_SST:BOOL=ON
[317/326] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/mathlibs/build/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1 && :
FAILED: bin/adios2_reorganize.serial
: && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/mathlibs/build/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1 && :
/usr/bin/ld: warning: libadios2_serial_perfstubs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_evpath.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_ffs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_atl.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `attr_list_from_string'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMfork_comm_thread'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `free_attr_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMwrite'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `register_data_format'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `FFSencode'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `copy_field_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMCondition_get_client_data'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `format_list_of_FMFormat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CManager_close'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `set_int_attr'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `free_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `ps_initialize_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `create_fixed_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMregister_invalid_message_handler'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `FFSTypeHandle_by_index'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `create_FFSContext_FM'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `free_FMfield_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `free_FFSContext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `FFSset_fixed_target'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `ps_timer_stop_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `name_of_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `set_string_attr'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMlisten'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMremove_task'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `perfstubs_initialized'
...
And still fails with -DEVPATH_TRANSPORT_MODULES=OFF
[308/320] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/mathlibs/build/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_server.cpp.o source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_common.cpp.o -o bin/adios2_remote_server.serial lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.1 lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1 lib/x86_64-linux-gnu/libadios2_serial_ffs.so.2.10.1 lib/x86_64-linux-gnu/libadios2_serial_atl.so.2.10.1 -ldl /usr/lib/x86_64-linux-gnu/libpugixml.so.1.14 -Wl,-rpath-link,/projects/mathlibs/build/adios2/build-serial/lib/x86_64-linux-gnu && :
FAILED: bin/adios2_remote_server.serial
: && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/mathlibs/build/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_server.cpp.o source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_common.cpp.o -o bin/adios2_remote_server.serial lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.1 lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1 lib/x86_64-linux-gnu/libadios2_serial_ffs.so.2.10.1 lib/x86_64-linux-gnu/libadios2_serial_atl.so.2.10.1 -ldl /usr/lib/x86_64-linux-gnu/libpugixml.so.1.14 -Wl,-rpath-link,/projects/mathlibs/build/adios2/build-serial/lib/x86_64-linux-gnu && :
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.1: undefined reference to `cmfabric_add_static_transport'
collect2: error: ld returned 1 exit status
That the linker isn't finding libraries that are built as part of the ADIOS build: ld: warning: libadios2_serial_perfstubs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libadios2_serial_evpath.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_ffs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_atl.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
seems to be an insurmountable problem. Then again, the names here look strange. Normally the libraries should be libadios2_ffs.so.2.10, without the extra "_serial" in it.
Ah, I was going to ask for how you configured ADIOS, but I followed the "debian/rules" link above and it looks like debian has a quite complex build system that bypasses most of our stuff, for example deleting everything under ADIOS/thirdparty and using external versions of those libraries. Offhand I'd say that however those libraries are build built, they either aren't named correctly or the paths are set correctly for the link to find them. How that's supposed to be working I have no idea. Not something the adios team has had a direct hand in.
The debian packaging added the _serial
(and _mpi
) suffices to enable simultaneous package installation of both types of build. For comparison, for our h5py package we originally only had the MPI build, but there was a complaint that loading MPI libraries was slowing down start-up in serial jobs, so a serial h5py build was added. I'm not sure that complaint applies to adios2. In principle it would, but we could consider dropping the serial build if the package configuration is proving to be a problem.
In the case of -DEVPATH_TRANSPORT_MODULES=ON
, I can see the "missing" symbols are defined in the other built libraries. It's as if the debian build environment isn't using a proper LD_LIBRARY_PATH
(rpath would be another way to deal with that). I'll dig further to understand the problem in the OFF case.
I'll test a separate manual build without the debian package considerations and see at which point I can reproduce the problem (or not) manually.
It's been a long time since the original debian package rules were produced for ADIOS. In the meantime, the ADIOS build system itself has been rewritten to produce both serial and MPI versions of the ADIOS library in a single build. Presumably this means that the debian rules will need some adjustment.
The sticky point (for debian) is handling the linking to HDF5 (libhdf5.so). Debian provides two alternative HDF5 builds, serial and mpi (the original debian packaging for ADIOS2 would have been trying to follow likewise).
ADIOS' new configuration system builds both serial and MPI, but that's with respect to linking libmpi.so or not. Either way, if I'm not mistaken both versions are linked to the same libhf5.so, either libhdf5_serial.so or libhdf5_mpi.so as identified during the configuration step.
I think it might not be so difficult for Debian to sort out. Currently Debian builds three alternative ADIOS libraries,
So essentially Debian's libadios2_mpi_core.so is nonsense. It's ADIOS' serial build linking against libhdf5_mpi. Debian will want to drop it and replace it with the "libadios2_mpi_core_mpi.so" build.
I'm assuming here that it's sensible for the ADIOS serial build to be linking against HDF5 serial, and the ADIOS mpi build to link HDF5 mpi. Would there be advantage in the ADIOS build system being able to handle the two alternative HDF builds when configuring?
Debian's libadios2_mpi_core.so is nonsense
Actually, no, it is needed by the mpi-linked library (libadios2_mpi_core_mpi.so).
I don't know how to solve this problem. I assume the one-build adios approach is not compatible with the two-build approach of hdf5, so you may need to keep two separate adios builds, one with mpi and parallel hdf5 and one without.
In our builds we have only to core libraries:
libadios2_core_mpi.so
and libadios2_core.so
. The former depends on the latter, and hence they cannot depend on different hdf5 libraries.
$ ldd libadios2_core_mpi.so | grep core
libadios2_core.so.2.10 (0x00007f8454c67000)
True, that is what the debian build is doing. We have one build configuration for serial-only (ADIOS2_USE_MPI=OFF with libhdf5_serial.so), and a separate build for MPI (ADIOS2_USE_MPI=ON with libhdf5_openmpi.so).
I've now identified the problem of undefined references with EVPATH_TRANSPORT_MODULES=ON. It's an rpath policy issue. Debian policy is to not place RUNPATH in packaged libraries, so the build was configured with CMAKE_SKIP_RPATH=ON. The undefined references were a consequence; the executables being built don't know where the freshly built libraries are. The solution is to use CMAKE_SKIP_INSTALL_RPATH=ON instead of CMAKE_SKIP_RPATH. I wonder if there has been a change in how cmake handles CMAKE_SKIP_RPATH, since I recently had a similar problem in superlu. This cmake version is 3.30.3.
So, using CMAKE_SKIP_INSTALL_RPATH=ON I can get a successful build with EVPATH_TRANSPORT_MODULES=ON.
It doesn't resolve the original problem with cmfabric (with EVPATH_TRANSPORT_MODULES=OFF).
It seems we have 2 workarounds then
For general use packaging, which would you recommend? Would it be better for general use to switch EVPATH_TRANSPORT_MODULES=ON anyway?
Building a debian package for adios 2.10.0 fails with "libadios2_serial_evpath.so.2.10.0: undefined reference to `cmfabric_add_static_transport'":
cmfabric_add_static_transport
is used in EVPath/cm_transport.c if libfabric is found (libfabric 1.17.0 is available). But libfabric does not define cmfabric_add_static_transport.EVPath/cmfabric.c contains an extern definition. But it is wrapped by
_WITH_IB_
. Evidently_WITH_IB_
is not available.There seems to be an inconsistency between
LIBFABRIC_FOUND
using cmfabric_add_static_transport in cm_transport.c and_WITH_IB_
defining cmfabric_add_static_transport in cmfabric.c.To Reproduce A minimal reproducible example is preferred. Or Steps to reproduce the behavior:
-DEVPATH_TRANSPORT_MODULES=OFF
) given in debian/rulesExpected behavior Build should not have undefined references, or should signal a missing component during cmake configuration if required
Desktop (please complete the following information):
Additional context
If
-DEVPATH_TRANSPORT_MODULES=ON
is used instead (ON instead of OFF), a swarm of other undefined symbols is reported: