pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
536 stars 280 forks source link

Unable to statically link Fortran applications #7087

Closed joscot-linaro closed 3 weeks ago

joscot-linaro commented 1 month ago

Summary

This might be a configuration/install error but I am unable to compile Fortran MPI programs with the -static option. When attempting to do so I get the following linker errors:

/usr/bin/ld: /path/to/mpich/lib/libmpi.a(libfabric_la-fabric.o): in function `ofi_reg_dl_prov':
fabric.c:(.text+0x880): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-rxm_init.o): in function `rxm_getinfo':
rxm_init.c:(.text+0x5df): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_av.o): in function `efa_ah_release.part.0':
efa_av.c:(.text+0x1ae): undefined reference to `ibv_destroy_ah'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_av.o): in function `efa_av_insert_one':
efa_av.c:(.text+0x18cd): undefined reference to `ibv_create_ah'
/usr/bin/ld: efa_av.c:(.text+0x18f6): undefined reference to `efadv_query_ah'
/usr/bin/ld: efa_av.c:(.text+0x23c7): undefined reference to `ibv_destroy_ah'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_device.o): in function `efa_device_construct':
efa_device.c:(.text+0x15): undefined reference to `ibv_open_device'
/usr/bin/ld: efa_device.c:(.text+0x67): undefined reference to `ibv_query_device'
/usr/bin/ld: efa_device.c:(.text+0x92): undefined reference to `efadv_query_device'
/usr/bin/ld: efa_device.c:(.text+0x17e): undefined reference to `ibv_query_port'
/usr/bin/ld: efa_device.c:(.text+0x1ac): undefined reference to `ibv_query_gid'
/usr/bin/ld: efa_device.c:(.text+0x1c0): undefined reference to `ibv_alloc_pd'
/usr/bin/ld: efa_device.c:(.text+0x295): undefined reference to `ibv_close_device'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_device.o): in function `efa_device_list_finalize':
efa_device.c:(.text+0x585): undefined reference to `ibv_close_device'
/usr/bin/ld: efa_device.c:(.text+0x5cd): undefined reference to `ibv_dealloc_pd'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_device.o): in function `efa_device_list_initialize':
efa_device.c:(.text+0x715): undefined reference to `ibv_get_device_list'
/usr/bin/ld: efa_device.c:(.text+0x793): undefined reference to `ibv_free_device_list'
/usr/bin/ld: efa_device.c:(.text+0x7b0): undefined reference to `ibv_free_device_list'
/usr/bin/ld: efa_device.c:(.text+0x7c3): undefined reference to `ibv_free_device_list'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_fork_support.o): in function `efa_fork_support_request_initialize':
efa_fork_support.c:(.text+0xe3): undefined reference to `ibv_is_fork_initialized'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_fork_support.o): in function `efa_fork_support_enable_if_requested':
efa_fork_support.c:(.text+0x1da): undefined reference to `ibv_is_fork_initialized'
/usr/bin/ld: efa_fork_support.c:(.text+0x2e1): undefined reference to `ibv_fork_init'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_mr.o): in function `efa_mr_dereg_impl':
efa_mr.c:(.text+0x4f): undefined reference to `ibv_dereg_mr'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_mr.o): in function `efa_mr_reg_impl':
efa_mr.c:(.text+0x557): undefined reference to `ibv_reg_mr'
/usr/bin/ld: efa_mr.c:(.text+0x785): undefined reference to `ibv_reg_dmabuf_mr'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_rdm_ep_fiops.o): in function `efa_rdm_ep_close':
efa_rdm_ep_fiops.c:(.text+0xab1): undefined reference to `ibv_destroy_cq'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_rdm_ep_fiops.o): in function `efa_rdm_ep_open':
efa_rdm_ep_fiops.c:(.text+0x24a8): undefined reference to `ibv_destroy_cq'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-libnl_utils_common.o): in function `usnic_rt_raw_parse_cb':
libnl_utils_common.c:(.text+0x2c): undefined reference to `nlmsg_hdr'
/usr/bin/ld: libnl_utils_common.c:(.text+0x3c): undefined reference to `nl_socket_get_local_port'
/usr/bin/ld: libnl_utils_common.c:(.text+0x9e): undefined reference to `nl_nlmsgtype2str'
/usr/bin/ld: libnl_utils_common.c:(.text+0xb4): undefined reference to `nlmsg_data'
/usr/bin/ld: libnl_utils_common.c:(.text+0xd5): undefined reference to `nlmsg_parse'
/usr/bin/ld: libnl_utils_common.c:(.text+0xec): undefined reference to `nla_get_u32'
/usr/bin/ld: libnl_utils_common.c:(.text+0x10c): undefined reference to `nlmsg_data'
/usr/bin/ld: libnl_utils_common.c:(.text+0x11c): undefined reference to `nlmsg_size'
/usr/bin/ld: libnl_utils_common.c:(.text+0x14f): undefined reference to `nla_get_u32'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-libnl_utils_common.o): in function `usnic_nl_rt_lookup':
libnl_utils_common.c:(.text+0x21e): undefined reference to `nl_socket_alloc'
/usr/bin/ld: libnl_utils_common.c:(.text+0x234): undefined reference to `nl_connect'
/usr/bin/ld: libnl_utils_common.c:(.text+0x244): undefined reference to `nl_socket_disable_seq_check'
/usr/bin/ld: libnl_utils_common.c:(.text+0x25e): undefined reference to `nl_socket_get_fd'
/usr/bin/ld: libnl_utils_common.c:(.text+0x2c1): undefined reference to `nlmsg_alloc_simple'
/usr/bin/ld: libnl_utils_common.c:(.text+0x2e2): undefined reference to `nlmsg_append'
/usr/bin/ld: libnl_utils_common.c:(.text+0x2f2): undefined reference to `nla_put_u32'
/usr/bin/ld: libnl_utils_common.c:(.text+0x301): undefined reference to `nla_put_u32'
/usr/bin/ld: libnl_utils_common.c:(.text+0x309): undefined reference to `nlmsg_hdr'
/usr/bin/ld: libnl_utils_common.c:(.text+0x327): undefined reference to `nl_socket_get_local_port'
/usr/bin/ld: libnl_utils_common.c:(.text+0x342): undefined reference to `nlmsg_set_proto'
/usr/bin/ld: libnl_utils_common.c:(.text+0x358): undefined reference to `nl_send'
/usr/bin/ld: libnl_utils_common.c:(.text+0x373): undefined reference to `nlmsg_free'
/usr/bin/ld: libnl_utils_common.c:(.text+0x3f4): undefined reference to `nlmsg_free'
/usr/bin/ld: libnl_utils_common.c:(.text+0x432): undefined reference to `nl_socket_modify_cb'
/usr/bin/ld: libnl_utils_common.c:(.text+0x443): undefined reference to `nl_recvmsgs_default'
/usr/bin/ld: libnl_utils_common.c:(.text+0x47f): undefined reference to `nl_recvmsgs_default'
/usr/bin/ld: libnl_utils_common.c:(.text+0x4a1): undefined reference to `nl_close'
/usr/bin/ld: libnl_utils_common.c:(.text+0x4aa): undefined reference to `nl_socket_free'
/usr/bin/ld: libnl_utils_common.c:(.text+0x4d2): undefined reference to `nl_close'
/usr/bin/ld: libnl_utils_common.c:(.text+0x4da): undefined reference to `nl_socket_free'
/usr/bin/ld: libnl_utils_common.c:(.text+0x4ec): undefined reference to `nl_socket_free'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_base_ep.o): in function `efa_base_ep_destruct_qp':
efa_base_ep.c:(.text+0x3f): undefined reference to `ibv_destroy_qp'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_base_ep.o): in function `efa_base_ep_destruct':
efa_base_ep.c:(.text+0x218): undefined reference to `ibv_destroy_ah'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_base_ep.o): in function `efa_base_ep_create_qp':
efa_base_ep.c:(.text+0x323): undefined reference to `efadv_create_qp_ex'
/usr/bin/ld: efa_base_ep.c:(.text+0x338): undefined reference to `ibv_qp_to_qp_ex'
/usr/bin/ld: efa_base_ep.c:(.text+0x437): undefined reference to `ibv_create_qp'
/usr/bin/ld: /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_base_ep.o): in function `efa_base_ep_enable':
efa_base_ep.c:(.text+0x4df): undefined reference to `ibv_modify_qp'
/usr/bin/ld: efa_base_ep.c:(.text+0x62c): undefined reference to `ibv_create_ah'
/usr/bin/ld: efa_base_ep.c:(.text+0x692): undefined reference to `ibv_modify_qp'
/usr/bin/ld: efa_base_ep.c:(.text+0x6ee): undefined reference to `ibv_modify_qp'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_dgram_cq.o): in function `efa_dgram_cq_close':
efa_dgram_cq.c:(.text+0x13c): undefined reference to `ibv_destroy_cq'
/usr/bin/ld:  /path/to/mpich/lib/libmpi.a(src_libfabric_la-efa_dgram_cq.o): in function `efa_dgram_cq_open':
efa_dgram_cq.c:(.text+0x7ff): undefined reference to `ibv_destroy_cq'

Affected Version

MPICH 4.2.2 built with GCC 11.4.0 on Ubuntu 22.04.

Configure line:

$ ./configure --enable-static=yes --enable-sharedlib=gcc

Reproducer

program repro
use mpi
implicit none
integer :: pe, nprocs, ierr

call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, pe, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)

call imbalance

call MPI_FINALIZE(ierr)

contains

subroutine imbalance

  integer :: i,j,iterations
  real(kind=8)    :: a(10500),b(10500)

  do iterations=1,4
    a=1.1 + iterations
    do j=0,pe
      do i=1,size(a)
         a=sqrt(a)+1.1*j
      end do
    end do
    call MPI_ALLREDUCE(a,b,size(a),MPI_REAL,MPI_SUM,MPI_COMM_WORLD,ierr)
  end do
  if (pe == 0) print *,"imbalance answer",b(1)
  call MPI_BARRIER(MPI_COMM_WORLD,ierr)

end subroutine imbalance

end program repro

Compile the above program with the -static option:

mpif90 -static -g -O3 -fno-inline -fno-optimize-sibling-calls -o repro repro.f90 -lm -lrt

Worth noting this also occurs when using the mpi_f08 interface.

raffenet commented 1 month ago

@joscot-linaro does your system have pkg-config installed? Our configure script uses it to grab the libfabric dependencies that look to be missing from your link step. Unfortunately there is no warning if pkg-config is not available, and thus the dependencies are not detected.

raffenet commented 1 month ago

I'll also note that even with pkg-config installed, static linking was still not working in main because of another change to Makefile.am. I created #7090 to fix that problem.

joscot-linaro commented 1 month ago

does your system have pkg-config installed?

Indeed it did:

$ pkg-config --version
0.29.2

Not sure if it needed a later version etc? I was also building with trunk and it did fail, can try rebuilding mpich 4.2.2 again to just verify static linking is working without https://github.com/pmodels/mpich/pull/7090.

raffenet commented 1 month ago

OK. Please send the top-level config.log file from any build that doesn't work properly. It should have some clues about dependency handling.

joscot-linaro commented 1 month ago

Sure here it is: config.log

raffenet commented 1 month ago

I do see the deps getting captured:

WRAPPER_LIBS(=' -lpthread  ') does not contain '-L/home/joscot01/mpich/mpich-4.2.2/lib  -libverbs -lefa -lnl-3 -lnl-route-3 -latomic -lpthread -ldl', appending

Can you add the -show flag to your compilation like this:

mpif90 -show -static -g -O3 -fno-inline -fno-optimize-sibling-calls -o repro repro.f90 -lm -lrt

That should output the full command being executed, which should provide more clues...

joscot-linaro commented 1 month ago

Hey @raffenet,

In retrospect the error I was getting with 4.2.2 was caused when using --static rather than -static so might have been an issue on my part. If I use -static it looks like a file is outputted though I do get these compiler warnings:

$ ../mpich-4.2.2/bin/mpif90 -static -g -O0 -o slow_f ~/git/forge/repo-1/examples/slow.f90 /usr/bin/ld: /home/joscot01/mpich/mpich-4.2.2/lib/libmpi.a(libfabric_la-fabric.o): in function `ofi_reg_dl_prov':
fabric.c:(.text+0x880): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld: /home/joscot01/mpich/mpich-4.2.2/lib/libmpi.a(src_libfabric_la-rxm_init.o): in function `rxm_getinfo':
rxm_init.c:(.text+0x5df): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/libnl-3.a(libnl_3_la-utils.o): in function `nl_ip_proto2str':
(.text+0x14b1): warning: Using 'getprotobynumber' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/libnl-3.a(libnl_3_la-utils.o): in function `nl_str2ip_proto':
(.text+0x152d): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

Here is the show values for both instances:

$ ../mpich-4.2.2/bin/mpif90 -show -static -g -O0 -o slow_f ~/git/forge/repo-1/examples/slow.f90 
gfortran -I /home/joscot01/.conan/data/mpich/4.2.0/warwci/test/package/d2d5b79f94cad3c617bd7bb6e54ff3d8f22d3c6a/include -static -g -O0 -o slow_f /home/joscot01/git/forge/repo-1/examples/slow.f90 -I/home/joscot01/mpich/mpich-4.2.2/include -I/home/joscot01/mpich/mpich-4.2.2/include -L/home/joscot01/mpich/mpich-4.2.2/lib -lmpifort -Wl,-rpath -Wl,/home/joscot01/mpich/mpich-4.2.2/lib -Wl,--enable-new-dtags -lmpi -L/home/joscot01/mpich/mpich-4.2.2/lib -lm -lefa -libverbs -lnl-3 -lnl-route-3 -latomic -lpthread -ldl

and

$ ../mpich-4.2.2/bin/mpif90 -show --static -g -O0 -o slow_f ~/git/forge/repo-1/examples/slow.f90 
gfortran -I /home/joscot01/.conan/data/mpich/4.2.0/warwci/test/package/d2d5b79f94cad3c617bd7bb6e54ff3d8f22d3c6a/include --static -g -O0 -o slow_f /home/joscot01/git/forge/repo-1/examples/slow.f90 -I/home/joscot01/mpich/mpich-4.2.2/include -I/home/joscot01/mpich/mpich-4.2.2/include -L/home/joscot01/mpich/mpich-4.2.2/lib -lmpifort -Wl,-rpath -Wl,/home/joscot01/mpich/mpich-4.2.2/lib -Wl,--enable-new-dtags -lmpi
joscot-linaro commented 1 month ago

Note: I discovered this originally on trunk/main where the "undefined references" issues do occur with -static also, might be fixed with https://github.com/pmodels/mpich/pull/7090 though not tested

hzhou commented 3 weeks ago

I believe #7090 fixes this issue. If it persists, please reopen.

joscot-linaro commented 2 weeks ago

Thanks, tested with trunk and apart from the compiler warning mentioned above, it appears static linking is no longer an issue. Thanks