open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.12k stars 856 forks source link

Internal error using shmem_reduce in example/oshmem_max_reduction.c #12419

Open smguzik opened 5 months ago

smguzik commented 5 months ago

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From source tarball using: Configure command line: '--build=x86_64-linux-gnu' '--prefix=/usr/local/openmpi/5.0.2_gcc-12.2.0' '--with-ucx' '--with-pmix=internal' '--with-libevent=external' '--with-hwloc=external' '--enable-mpi-fortran=all' '--with-cuda=/usr/local/cuda' '--with-cuda-libdir=/usr/lib/x86_64-linux-gnu'

Please describe the system on which you are running


Details of the problem

oshmem_max_reduction.c works as provided in the examples directory. However, using the more recent API, replacing

shmem_long_max_to_all(dst, src, N, 0, 0, num_pes, pWrk, pSync);

with

shmem_long_max_reduce(SHMEM_TEAM_WORLD, dst, src, N);

fails with the message

[shmem_reduce.c:473:pshmem_long_max_reduce] Internal error is appeared rc = -7
wenduwan commented 5 months ago

@janjust I see --with-ucx - guess you would be interested 😄

wenduwan commented 5 months ago

Added main label assuming oshmem is the same with v5.0.x

roiedanino commented 5 months ago

It seems that the new API is not implemented yet in UCX spml module (or anywhere else):

From ucx/spml.c:1850

/* This routine is not implemented */
int mca_spml_ucx_team_reduce(shmem_team_t team, void
        *dest, const void *source, size_t nreduce, int operation, int datatype)
{
    return OSHMEM_ERR_NOT_IMPLEMENTED;
}

@MamziB Any chance I'm missing something? or it's a known TBD?

popina1994 commented 5 months ago

I am having the same issue. Should I use the old OpenSHMEM API or there is a way to bypass this?

MamziB commented 5 months ago

@roiedanino yeah we will implement this in the future. @popina1994 Should I use the old OpenSHMEM API or there is a way to bypass this? yes please go ahead and use the old openshmem for now. if I find a better workaround I will update here.

gleon99 commented 5 months ago

@MamziB can reassign to yourself please?

gleon99 commented 4 months ago

@MamziB ?

MamziB commented 4 months ago

@gleon99 Sure let me assign it to myself. Thanks for reminder.