ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
268 stars 125 forks source link

libadios2_serial_evpath.so.2.10.0: undefined reference to `cmfabric_add_static_transport' #4156

Open drew-parsons opened 4 months ago

drew-parsons commented 4 months ago

Building a debian package for adios 2.10.0 fails with "libadios2_serial_evpath.so.2.10.0: undefined reference to `cmfabric_add_static_transport'":

[228/329] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_
FORTIFY_SOURCE=2 -Wl,-z,relro source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_server.cpp.o source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_common.cpp.o -o bin/adio
s2_remote_server.serial  lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.0  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0  lib/x86_64-linux-gnu/libadios2_serial_ffs.so.2.10.0  lib/x86_64-linux-gnu/li
badios2_serial_atl.so.2.10.0  -ldl  -Wl,-rpath-link,/projects/adios2/build-serial/lib/x86_64-linux-gnu && :
FAILED: bin/adios2_remote_server.serial 
: && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SO
URCE=2 -Wl,-z,relro source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_server.cpp.o source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_common.cpp.o -o bin/adios2_remote_
server.serial  lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.0  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0  lib/x86_64-linux-gnu/libadios2_serial_ffs.so.2.10.0  lib/x86_64-linux-gnu/libadios2_se
rial_atl.so.2.10.0  -ldl  -Wl,-rpath-link,/projects/adios2/build-serial/lib/x86_64-linux-gnu && :
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.0: undefined reference to `cmfabric_add_static_transport'
collect2: error: ld returned 1 exit status

cmfabric_add_static_transport is used in EVPath/cm_transport.c if libfabric is found (libfabric 1.17.0 is available). But libfabric does not define cmfabric_add_static_transport.

EVPath/cmfabric.c contains an extern definition. But it is wrapped by _WITH_IB_. Evidently _WITH_IB_ is not available.

There seems to be an inconsistency between LIBFABRIC_FOUND using cmfabric_add_static_transport in cm_transport.c and _WITH_IB_ defining cmfabric_add_static_transport in cmfabric.c.

To Reproduce A minimal reproducible example is preferred. Or Steps to reproduce the behavior:

  1. Build using configuration (including -DEVPATH_TRANSPORT_MODULES=OFF) given in debian/rules
  2. Start build (cmake; make triggered by dpkg-buildpackage)
  3. See error

Expected behavior Build should not have undefined references, or should signal a missing component during cmake configuration if required

Desktop (please complete the following information):

Additional context

If -DEVPATH_TRANSPORT_MODULES=ON is used instead (ON instead of OFF), a swarm of other undefined symbols is reported:

[237/335] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_
FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios
_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0 && :
FAILED: bin/adios2_reorganize.serial 
: && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SO
URCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganiz
e.dir/Utils.cpp.o -o bin/adios2_reorganize.serial  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0 && :
/usr/bin/ld: warning: libadios2_serial_perfstubs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_evpath.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_ffs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_atl.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `attr_list_from_string'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `CMfork_comm_thread'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_attr_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `CMwrite'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `register_data_format'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSencode'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `copy_field_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `CMCondition_get_client_data'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `format_list_of_FMFormat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `CManager_close'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `set_int_attr'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_initialize_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_fixed_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `CMregister_invalid_message_handler'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSTypeHandle_by_index'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_FFSContext_FM'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FMfield_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FFSContext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSset_fixed_target'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_timer_stop_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `name_of_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `set_string_attr'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `CMlisten'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `CMremove_task'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `perfstubs_initialized'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_make_timer_name_'
etc
etc
eisenhauer commented 4 months ago

Unfortunately, this is probably the same issue as #4103 . ADIOS is looking for libfabric before EVPath does, which messes with the CMake in EVPath and produces undesirable results. In #4103, the solution was to simply not have libfabric installed. Neither EVPath nor ADIOS need or uses it unless you are on a machine with an RDMA transport. Alternatively, you can run cmake with -DCMAKE_DISABLE_FIND_PACKAGE_LIBFABRIC=TRUE, or -DADIOS2_USE_SST=FALSE (if you don't need the SST streaming transport). This problem should be fixed in master branch, and so will be fixed in 2.10.1 when that release happens.

drew-parsons commented 4 months ago

Thanks. One of the design goals for the debian packages is to support clusters (cloud computing), so RDMA (and fbric) support is wanted. I'll look forward to the new release, and try the workarounds or patches in the meantime.

drew-parsons commented 4 months ago

-DADIOS2_USE_SST=FALSE does not fix the problem. It actually gives even more missing symbols

[212/308] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0 && :
FAILED: bin/adios2_reorganize.serial 
: && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0 && :
/usr/bin/ld: warning: libadios2_serial_perfstubs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_ffs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `register_data_format'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSencode'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `format_list_of_FMFormat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_initialize_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_fixed_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_FFSContext_FM'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FMfield_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FFSContext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_timer_stop_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `name_of_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `perfstubs_initialized'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_make_timer_name_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `get_server_ID_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMfree_struct_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMContext_from_FFS'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `get_server_rep_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMfree_var_rec_elements'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSTypeHandle_from_encode'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `load_external_format_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_timer_create_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFS_est_decode_length'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSdecode_in_place_possible'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_timer_start_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMformat_from_ID'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `free_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `establish_conversion'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `create_local_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSdecode_to_buffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMdump_data'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMcopy_struct_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMFormat_of_original'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFShas_conversion'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FFSdecode_in_place'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `FMstruct_size_field_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.0: undefined reference to `ps_finalize_'
collect2: error: ld returned 1 exit status
drew-parsons commented 4 months ago

Same with both -DADIOS2_USE_SST=FALSE and -DCMAKE_DISABLE_FIND_PACKAGE_LIBFABRIC=TRUE

eisenhauer commented 4 months ago

Not sure what's going on with the workarounds, but you might try github master.

Libfabric can installed on non-RDMA machines of course, and will configure itself to offer only the sockets-based provider which is theoretically useful for testing libfabric code. In practice however it's not useful (for complicated reasons), and libfabric code like that in ADIOS is quite likely not to work if there's not a real RDMA provider underneath libfabric. Certainly if it does work, performance would be terrible, which is why we don't generally know if ADIOS would build in this circumstance, we don't do it. Instead, ADIOS takes the presence of certain libraries as indicative of the sort of machine that it's on and builds features accordingly. Unfortunately this isn't the sort of dependency that packaging systems seem to capture well.

drew-parsons commented 4 months ago

Thanks. I'll report again if I can confirm that the problem is lifted (or not) in a later code base.

drew-parsons commented 2 months ago

I'm still getting undefined references with 2.10.1, with -DEVPATH_TRANSPORT_MODULES=ON -DADIOS2_USE_SST:BOOL=ON

[317/326] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/mathlibs/build/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1 && :
FAILED: bin/adios2_reorganize.serial 
: && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/mathlibs/build/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/main.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/adios_reorganize/Reorganize.cpp.o source/utils/CMakeFiles/adios_reorganize.dir/Utils.cpp.o -o bin/adios2_reorganize.serial  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1 && :
/usr/bin/ld: warning: libadios2_serial_perfstubs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_evpath.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_ffs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_atl.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `attr_list_from_string'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMfork_comm_thread'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `free_attr_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMwrite'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `register_data_format'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `FFSencode'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `copy_field_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMCondition_get_client_data'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `format_list_of_FMFormat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CManager_close'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `set_int_attr'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `free_FMcontext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `ps_initialize_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `create_fixed_FFSBuffer'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMregister_invalid_message_handler'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `FFSTypeHandle_by_index'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `create_FFSContext_FM'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `free_FMfield_list'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `free_FFSContext'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `FFSset_fixed_target'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `ps_timer_stop_'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `name_of_FMformat'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `set_string_attr'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMlisten'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `CMremove_task'
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1: undefined reference to `perfstubs_initialized'
...
drew-parsons commented 2 months ago

And still fails with -DEVPATH_TRANSPORT_MODULES=OFF

[308/320] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/mathlibs/build/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_server.cpp.o source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_common.cpp.o -o bin/adios2_remote_server.serial  lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.1  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1  lib/x86_64-linux-gnu/libadios2_serial_ffs.so.2.10.1  lib/x86_64-linux-gnu/libadios2_serial_atl.so.2.10.1  -ldl  /usr/lib/x86_64-linux-gnu/libpugixml.so.1.14  -Wl,-rpath-link,/projects/mathlibs/build/adios2/build-serial/lib/x86_64-linux-gnu && :
FAILED: bin/adios2_remote_server.serial 
: && /usr/bin/c++ -g -O2 -ffile-prefix-map=/projects/mathlibs/build/adios2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_server.cpp.o source/adios2/toolkit/remote/CMakeFiles/adios2_remote_server.dir/remote_common.cpp.o -o bin/adios2_remote_server.serial  lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.1  lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1  lib/x86_64-linux-gnu/libadios2_serial_ffs.so.2.10.1  lib/x86_64-linux-gnu/libadios2_serial_atl.so.2.10.1  -ldl  /usr/lib/x86_64-linux-gnu/libpugixml.so.1.14  -Wl,-rpath-link,/projects/mathlibs/build/adios2/build-serial/lib/x86_64-linux-gnu && :
/usr/bin/ld: lib/x86_64-linux-gnu/libadios2_serial_evpath.so.2.10.1: undefined reference to `cmfabric_add_static_transport'
collect2: error: ld returned 1 exit status
eisenhauer commented 2 months ago

That the linker isn't finding libraries that are built as part of the ADIOS build: ld: warning: libadios2_serial_perfstubs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libadios2_serial_evpath.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)

/usr/bin/ld: warning: libadios2_serial_ffs.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libadios2_serial_atl.so.2.10, needed by lib/x86_64-linux-gnu/libadios2_serial_core.so.2.10.1, not found (try using -rpath or -rpath-link)

seems to be an insurmountable problem. Then again, the names here look strange. Normally the libraries should be libadios2_ffs.so.2.10, without the extra "_serial" in it.

Ah, I was going to ask for how you configured ADIOS, but I followed the "debian/rules" link above and it looks like debian has a quite complex build system that bypasses most of our stuff, for example deleting everything under ADIOS/thirdparty and using external versions of those libraries. Offhand I'd say that however those libraries are build built, they either aren't named correctly or the paths are set correctly for the link to find them. How that's supposed to be working I have no idea. Not something the adios team has had a direct hand in.

drew-parsons commented 2 months ago

The debian packaging added the _serial (and _mpi) suffices to enable simultaneous package installation of both types of build. For comparison, for our h5py package we originally only had the MPI build, but there was a complaint that loading MPI libraries was slowing down start-up in serial jobs, so a serial h5py build was added. I'm not sure that complaint applies to adios2. In principle it would, but we could consider dropping the serial build if the package configuration is proving to be a problem.

In the case of -DEVPATH_TRANSPORT_MODULES=ON, I can see the "missing" symbols are defined in the other built libraries. It's as if the debian build environment isn't using a proper LD_LIBRARY_PATH (rpath would be another way to deal with that). I'll dig further to understand the problem in the OFF case.

I'll test a separate manual build without the debian package considerations and see at which point I can reproduce the problem (or not) manually.

eisenhauer commented 2 months ago

It's been a long time since the original debian package rules were produced for ADIOS. In the meantime, the ADIOS build system itself has been rewritten to produce both serial and MPI versions of the ADIOS library in a single build. Presumably this means that the debian rules will need some adjustment.

drew-parsons commented 1 week ago

The sticky point (for debian) is handling the linking to HDF5 (libhdf5.so). Debian provides two alternative HDF5 builds, serial and mpi (the original debian packaging for ADIOS2 would have been trying to follow likewise).

ADIOS' new configuration system builds both serial and MPI, but that's with respect to linking libmpi.so or not. Either way, if I'm not mistaken both versions are linked to the same libhf5.so, either libhdf5_serial.so or libhdf5_mpi.so as identified during the configuration step.

I think it might not be so difficult for Debian to sort out. Currently Debian builds three alternative ADIOS libraries,

  1. libadios2_serial_core.so: links libhdf5_serial.so without libmpi.so
  2. libadios2_mpi_core.so: links libhdf5_openmpi.so without libmpi.so
  3. libadios2_mpi_core_mpi.so: links libhdf5_openmpi.so and libmpi.so

So essentially Debian's libadios2_mpi_core.so is nonsense. It's ADIOS' serial build linking against libhdf5_mpi. Debian will want to drop it and replace it with the "libadios2_mpi_core_mpi.so" build.

I'm assuming here that it's sensible for the ADIOS serial build to be linking against HDF5 serial, and the ADIOS mpi build to link HDF5 mpi. Would there be advantage in the ADIOS build system being able to handle the two alternative HDF builds when configuring?

drew-parsons commented 1 week ago

Debian's libadios2_mpi_core.so is nonsense

Actually, no, it is needed by the mpi-linked library (libadios2_mpi_core_mpi.so).

pnorbert commented 1 week ago

I don't know how to solve this problem. I assume the one-build adios approach is not compatible with the two-build approach of hdf5, so you may need to keep two separate adios builds, one with mpi and parallel hdf5 and one without.

In our builds we have only to core libraries: libadios2_core_mpi.so and libadios2_core.so. The former depends on the latter, and hence they cannot depend on different hdf5 libraries.

$ ldd libadios2_core_mpi.so | grep core
        libadios2_core.so.2.10 (0x00007f8454c67000)
drew-parsons commented 1 week ago

True, that is what the debian build is doing. We have one build configuration for serial-only (ADIOS2_USE_MPI=OFF with libhdf5_serial.so), and a separate build for MPI (ADIOS2_USE_MPI=ON with libhdf5_openmpi.so).

I've now identified the problem of undefined references with EVPATH_TRANSPORT_MODULES=ON. It's an rpath policy issue. Debian policy is to not place RUNPATH in packaged libraries, so the build was configured with CMAKE_SKIP_RPATH=ON. The undefined references were a consequence; the executables being built don't know where the freshly built libraries are. The solution is to use CMAKE_SKIP_INSTALL_RPATH=ON instead of CMAKE_SKIP_RPATH. I wonder if there has been a change in how cmake handles CMAKE_SKIP_RPATH, since I recently had a similar problem in superlu. This cmake version is 3.30.3.

So, using CMAKE_SKIP_INSTALL_RPATH=ON I can get a successful build with EVPATH_TRANSPORT_MODULES=ON.

It doesn't resolve the original problem with cmfabric (with EVPATH_TRANSPORT_MODULES=OFF).

It seems we have 2 workarounds then

  1. build EVPATH_TRANSPORT_MODULES=OFF, but without libfabric support
  2. build EVPATH_TRANSPORT_MODULES=ON

For general use packaging, which would you recommend? Would it be better for general use to switch EVPATH_TRANSPORT_MODULES=ON anyway?