open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.14k stars 858 forks source link

MPAS crash with Open MPI 4.1.4 with pml/cm + mtl/ofi #11751

Closed wzamazon closed 1 year ago

wzamazon commented 1 year ago

Thank you for taking the time to submit an issue!

Background information

MPAS (https://mpas-dev.github.io/) is a weather forecasting application, and currently it will crash when using Open MPI's pml/cm + mtl/ofi code path, and the pml/ob1 + btl/tcp code path is working

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

4.1.4

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

built from source

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running


Details of the problem

I did some debug, and I believe it is caused by a bug in pml/cm's heavy send request initialization.

In function MCA_PML_CM_SEND_REQUEST_INIT_COMMON, there is a special branch that handle the case when mpi data type is continuous

.........
    if (opal_datatype_is_contiguous_memory_layout(&datatype->super, count)) { \
        (req_send)->req_base.req_convertor.remoteArch =                 \
            ompi_mpi_local_convertor->remoteArch;                       \
        (req_send)->req_base.req_convertor.flags      =                 \
            ompi_mpi_local_convertor->flags;                            \
        (req_send)->req_base.req_convertor.master     =                 \
            ompi_mpi_local_convertor->master;                           \
        (req_send)->req_base.req_convertor.local_size =                 \
            count * datatype->super.size;                               \
        (req_send)->req_base.req_convertor.pBaseBuf   =                 \
            (unsigned char*)buf + datatype->super.true_lb;              \
        (req_send)->req_base.req_convertor.count      = count;          \
        (req_send)->req_base.req_convertor.pDesc      = &datatype->super; \
    } else {  
......

code: https://github.com/open-mpi/ompi/blob/c053cb822770738de5df0a3e998899398d35e728/ompi/mca/pml/cm/pml_cm_sendreq.h#L232

This function MCA_PML_CM_SEND_REQUEST_INIT_COMMON is only applied to thin send request, heavy send request was initialized in a different function MCA_PML_CM_HVY_SEND_REQUEST_INIT_COMMON, and that function does not have the special treatment for data type with continuous data layout.

code: https://github.com/open-mpi/ompi/blob/c053cb822770738de5df0a3e998899398d35e728/ompi/mca/pml/cm/pml_cm_sendreq.h#LL153C9-L153C48

The critical line is:

       (req_send)->req_base.req_convertor.pBaseBuf   =                 \
            (unsigned char*)buf + datatype->super.true_lb;              \

without this condition, heavy send request will be sending from a buffer without proper data offset (datatype->super.true_lb), and the receiver will receive wrong data.

wzamazon commented 1 year ago

https://github.com/open-mpi/ompi/pull/11752 PR try to address the issue.

I am writing a C reproducer to demonstrate the issue.

bosilca commented 1 year ago

That's why the convertor is the right place to handle all these cases. I really wonder if this is really an optimization or it is just a way to create more headaches.

wzamazon commented 1 year ago

That's why the convertor is the right place to handle all these cases. I really wonder if this is really an optimization or it is just a way to create more headaches.

Main and 5.0.x is using convertor correctly, after some recent refactor.

the problem was introduced since 2.0.x by the following commit:

commit 56869bff38e264ee91ea68ae2fabfafe9456548e
Author: Jithin Jose <jithin.jose@intel.com>
Date:   Thu May 7 14:52:46 2015 -0700

    Avoid datatype pack/unpack for contiguous data on homogenous systems.

    Signed-off-by: Jithin Jose <jithin.jose@intel.com>

and impact from 2.0.x to 4.1.x

wzamazon commented 1 year ago

PR has been merged and backported