rookiehpc / rookiehpc.github.io

A website covering major HPC technologies, designed to welcome contributions.
GNU Affero General Public License v3.0
56 stars 12 forks source link

About something I discovered about `MPI_Scatterv()` #319

Closed Ross-Li closed 1 year ago

Ross-Li commented 1 year ago

This is something I discovered about what you can do with MPI_Scatterv() that is only slightly different with what your existing example program. I am not sure whether this is worth to be added as another example program to even better demonstrate the functionalities of MPI_Scatterv(), or everyone already knows about this and I am just late to figure it out. I would really appreciate it if you can share your opinion about it.

My goal

What I was trying to do is basically:

  1. "scatter" a array from root process to all slave processes BUT NOT to itself: image
  2. and DON'T create any other needless variables, neither in root process (e.g. a variable to store the slice send to itself) nor in slave processes.

It should be a simple and straightforward thing to MPI to do, but the regular MPI_Scatter() always sends a piece from sender to itself. It is truly a bit annoying.

My method (still not perfect)

I finally achieved this with MPI_Scatterv(), though still not perfect. The core code snippet is:

if (my_rank == root_rank) {
        //* This varaible is solely for filling the `MPI_Scatterv()` function's 
        //* `recvbuf` argument, otherwise it will produce error.
        char useless = 'U';

        // Initialize `to_send`
        int to_send[4] = {1, 2, 3, 4};

        // Initialize `counts`
        int counts[3] = {0, 2, 2};

        // Initialize `displacements`
        int displacements[3] = {0, 0, 2};

        printf("Values in the to_send of root process:");
        for (int i = 0; i < (sizeof(to_send) / sizeof(to_send[0])); i++) {
            printf(" %d", to_send[i]);
        }
        printf("\n");

        MPI_Scatterv(
            to_send, counts, displacements, MPI_INT, 
            &useless, 1, MPI_CHAR, 
            root_rank, MPI_COMM_WORLD
        );

        printf("Process %d received value (this is actually useless) %c.\n", my_rank, useless);
    } else {
        // Declare `my_values`
        int my_values[2];

        MPI_Scatterv(
            NULL, NULL, NULL, MPI_INT, 
            &my_values, 2, MPI_INT, 
            root_rank, MPI_COMM_WORLD
        );

        printf("Process %d received values %d, %d \n", my_rank, my_values[0], my_values[1]);
    }

Sample output is:

Values in the to_send of root process: 1 2 3 4
Process 0 received value (this is actually useless) U.
Process 1 received values 1, 2 
Process 2 received values 3, 4 

The imperfect aspect, as commented in the code snippet, is that I still have to create a useless variable to fill the recvbuf argument in MPI_Scatterv(), otherwise it will produce error. But it is pretty perfect now.

Contribution

The key difference between this pseudocode and the existing example code are:

  1. the length of the to_send buffer equals number_of_slaves which is 3, instead of number_of_all_processes which is 4.
  2. the length of counts array equals number_of_slaves which is 3, instead of number_of_all_processes which is 4.
  3. the length of displacements array equals number_of_slaves which is 3, instead of number_of_all_processes which is 4.

In a word, my point is that the length of to_send buffer, counts array and displacements array DON'T HAVE TO BE equal to number_of_all_processes. You can customize not only the content of these variables BUT ALSO the lengths of them to suit your needs. This is a point that took me a while to realize.

If you want to integrate this idea...

If you would like to integrate this idea into your website, I think it should be sufficient for you to just note that the lengths of these 3 variables doesn't have to necessarily equals the number of all processes in the parameter description section.

Improvement idea

I would also really appreciate it if you can provide suggestions on how to do exactly what I am trying to do, especially about that "not creating useless variables" part.

rookiehpc commented 1 year ago

Hi @Ross-Li,

There might have been a confusion in the understanding of MPI_Scatterv.

According to the MPI standard (version 4.0) available at https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf, in Section 6.6, on page 208, for MPI_Scatterv it states that sendcounts is a "non-negative integer array (of length group size) specifying the number of elements to send to each rank (significant only at root)", similarly for the displacements.

In other words, a precondition of MPI_Scatterv is that the two variables mentioned above must have a number of elements equal to the size of the group from which the communicator used in the collective operation is built. Not respecting this precondition therefore implies an undefined behaviour, which is this instance is not a crash it appears.

As a result, if the communicator that you pass to MPI_Scatterv is built from a group of 4 MPI processes, your send buffer, send counts and displacements must all be arrays of 4 elements.

Ross-Li commented 1 year ago

Ah, I should have read the official documentation first and told you the specific version of MPI I was using. If the documentation specifies that these arrays should be of group / communicator size, then indeed my program will produce undefined behavior and it was just luck that the program didn't crash. Thanks for your explanation!!!