mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
66 stars 7 forks source link

UB for passing uninitialized variables to "significant only at root" arguments #854

Open jprotze opened 3 weeks ago

jprotze commented 3 weeks ago

Problem

The wording about "significant only at root" is a bit sloppy and at least one example is broken. Although the parameters will not be read by the function on non-root ranks, passing uninitialized variables to the function is UB according to the C standard. Example 6.3 does not initialize rbuf on non-root ranks and passes this variable to MPI_Gather, which is UB. As a result, we observed the if (myrank == root) to be optimized away by clang 14 or newer and icx 2022.1 or newer.

Proposal

Changes to the Text

Change

int root, myrank, *rbuf;

to

int root, myrank, *rbuf=NULL;

And possibly explain why this initialization is important?

In the introduction of Chapter 6:

Some arguments in the collective functions are specified as “significant only at root,” and are ignored for all participants except the root.

Add:

The C standard nevertheless requires variables passed to these arguments to be initialized.

Impact on Implementations

None

Impact on Users

Avoid running into non-obvious UB by misunderstanding “significant only at root”

References and Pull Requests

devreal commented 3 weeks ago

Curious whether the compiler removed the branch because rbuf is uninitialized of because root is uninitialized in this example? Agreed that both are UB and we should be careful not to promote that.

jprotze commented 3 weeks ago

In our concrete code, we had root fixed to 0 and more code executed next to the malloc. A printf would even print rank as 0 for all ranks:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv) {
  if (argc != 2) {
    printf("No valid size given\n");
    return 1;
  }

  int n = atoi(argv[1]);

  MPI_Init(&argc, &argv);
  int rank, wsize;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &wsize);
  n = n - n % wsize;
  int n_per_process = n / wsize;

  float *x;
  float *x_root;

  if (rank == 0) {
    printf("From Rank %d: Initializing vectors...\n", rank);
    fflush(stdout);
    x_root = (float *)malloc(n * sizeof(float));
    memset(x_root, 0, n * sizeof(float));
  }

  x = (float *)malloc(n_per_process * sizeof(float));
  memset(x, 0, n_per_process * sizeof(float));
  MPI_Scatter(x_root, n_per_process, MPI_FLOAT, x, n_per_process, MPI_FLOAT, 0,
              MPI_COMM_WORLD);

  MPI_Finalize();
  free(x);
  return 0;
}
jeffhammond commented 2 weeks ago

I don't believe any of the examples are intended to be complete working code. Do they even include all the necessary headers to compile?

wgropp commented 2 weeks ago

That is correct, most examples are code snippets. The code check uses LaTeX comments to describe things like headers and declarations needed to see if the code could be compiled - this test has caught many errors, such as misspellings, out of order arguments, and missing semicolons, in the code snippets.

However, we do want the code snippets to be valid and compileable with the additional info, so things like ensure rbuf is initialized is reasonable for the examples.

mahermanns commented 2 weeks ago

Discussion in Tools WG on Sep 30, 2024:

mahermanns commented 3 days ago

Discussion on Tools WG on Oct 14, 2024:

jprotze commented 3 days ago

Here is a semantically equivalent reproducer without MPI to look at the compiler optimization observed above on godbolt. For clang the optimization seems to be introduced with version 14:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int test(int n, char *c) {
  if (n == 0)
    return 0;
  return atoi(c);
}

int main(int argc, char **argv) {
  char *buf;
  if (argc > 1) {
    buf = strdup(argv[1]);
  }
  int res = test(argc - 1, buf);
  return res;
}