open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.09k stars 845 forks source link

ofi_create_recv_tag mask hides ssend's "ack" bit #8051

Open hkuno opened 3 years ago

hkuno commented 3 years ago

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

master branch top of tree: commit eca00a7a3b179

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

$ git submodule status +299d2a489aa53546e1320eb3fd7e8d726f16b251 opal/mca/hwloc/hwloc2/hwloc (dev-3067-g299d2a4) +ee72a2b65b1b6480753fc12d500c51ebe4fc23aa opal/mca/pmix/pmix4x/openpmix (v1.1.3-2505-gee72a2b) +545863e6dc055233456116da6dc85be2b307f8e2 prrte (dev-30707-g545863e)

Please describe the system on which you are running

N/A


Details of the problem

The upper 2 bits of an ompi tag encode the synchronize send and synchronize send ack. Because the mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions both use ompi_mtl_ofi.sync_proto_mask instead of ompi_mtl_ofi.sync_send when generating their "ignore" masks, the recv tag-matching logic will disregard the ack bit so that it may match a tag that has the ack bit set.

This is an issue because ssend is implemented by doing a send and receive internally. So if there happens to be an outstanding receive posted by a user before an ssend, that user's receive may end up consuming the internal message intended for the ssend's internal receive.

Updating mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions to use ompi_mtl_ofi.sync_send fixes this.

For example, consider the following:

    if (my_rank == 0) {
        MPI_CALL(MPI_Irecv, NULL, 0,MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG,
                 MPI_COMM_WORLD, &rx_req);
        MPI_CALL(MPI_Ssend, NULL, 0, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
        MPI_CALL(MPI_Wait, &rx_req, &rx_status);
        if (rx_status.MPI_TAG != 2)
            fprintf(stderr,"%s,%u:Expected tag %d\n", __func__, __LINE__, 2);
    } else {
        MPI_CALL(MPI_Recv, NULL, 0, MPI_CHAR, 0, 1, MPI_COMM_WORLD, &rx_status);
        if (rx_status.MPI_TAG != 1)
            fprintf(stderr,"%s,%u:Expected tag %d\n", __func__, __LINE__, 1);
        MPI_CALL(MPI_Ssend, NULL, 0, MPI_CHAR, 0, 2, MPI_COMM_WORLD);
    }
    ret = 0;

If run with a debug build, that code will produce the following failed assertion:

ssend_test: mtl_ofi.h:688: ompi_mtl_ofi_recv_callback: Assertion `!(ompi_mtl_ofi.sync_send_ack == (ompi_mtl_ofi.sync_proto_mask & wc->tag))' failed.

Updating mtl_ofi_create_recv_tag_CQD and mtl_ofi_create_recv_tag functions to both use ompi_mtl_ofi.sync_send fixes this:

    163     *mask_bits  = ompi_mtl_ofi.sync_send;
acgoldma commented 3 years ago

I see this issue in 4.0.5 and 4.1.0. Can we backport this as well to v4.1.x?