open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.17k stars 861 forks source link

Building version v5.0.6 with CUDA fails #12924

Open lahwaacz opened 2 hours ago

lahwaacz commented 2 hours ago

Building OpenMPI version v5.0.6 with CUDA fails with the following error:

coll_cuda_module.c: In function ‘mca_coll_cuda_comm_query’:
coll_cuda_module.c:107:42: error: assignment to ‘mca_coll_base_module_reduce_local_fn_t’ {aka ‘int (*)(const void *, void *, int,  struct ompi_datatype_t *, struct ompi_op_t *, struct mca_coll_base_module_2_4_0_t *)’} from incompatible pointer type ‘int (*)(const void *, void *, size_t,  struct ompi_datatype_t *, struct ompi_op_t *, mca_coll_base_module_t *)’ {aka ‘int (*)(const void *, void *, long unsigned int,  struct ompi_datatype_t *, struct ompi_op_t *, struct mca_coll_base_module_2_4_0_t *)’} [-Wincompatible-pointer-types]
  107 |     cuda_module->super.coll_reduce_local = mca_coll_cuda_reduce_local;
      |                                          ^
lahwaacz commented 2 hours ago

Seems to be caused by https://github.com/open-mpi/ompi/commit/e3ad86eeba8cbdf62471c16598f06073baae093e

It specifies the function with size_t count parameter, which does not work because https://github.com/open-mpi/ompi/commit/4d7c65d0157c9c4c938d4e969d9abf63154a32ed was done only on the main branch, not on v5.0.x.