Open miEsMar opened 2 days ago
Small reproducer:
#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
#define STACK_SIZE 1024
static volatile int state = 0;
static int rank;
static char *stack_ptr = NULL;
static int cb(void *arg)
{
fprintf(stdout, "Hello, from rank %d child process!\n", rank);
MPI_Finalize();
state = 0;
return 0;
}
int main(void)
{
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Init(NULL, NULL);
MPI_Comm_rank(comm, &rank);
fprintf(stdout, "Hello, from rank %d parent process!\n", rank);
MPI_Barrier(comm);
stack_ptr = malloc(STACK_SIZE);
if (NULL == stack_ptr) return 1;
stack_ptr = stack_ptr + STACK_SIZE;
const uint64_t clone_flags =
CLONE_FILES | CLONE_FS | CLONE_SIGHAND | CLONE_VM;
state = 1;
if ((long)-1 == clone(&cb, stack_ptr, clone_flags, NULL)) {
fprintf(stderr, "Failed to clone process!\n");
exit(EXIT_FAILURE);
}
while (1 == state);
return 0;
}
According to MPI standard v4.1:
11.6.2 Clarifications
Initialization and Completion. When using the World Model, the call to MPI_FINALIZE
should occur on the same thread that initialized MPI. We call this thread the main thread.
The call should occur only after all process threads have completed their MPI calls, and
have no pending communication or I/O operations.
Rationale. This constraint simplifies implementation. (End of rationale.)
also if you want to use multiple threads with MPI, it should be initialized with threads support (using MPI_Init_thread
).
Thanks @brminich .
I see.
However, the same does not happen if instead of using clone()
one uses pthread_create()
, which is close to keeping the clone()
method but setting the CLONE_THREAD
flag bit ON.
That's because, like so, the child process/thread is put into the same thread group, while still remaining a different thread from the one that called the initialization.
Describe the bug
When using UCX as OpenMPI's pml component, UCX fails in the assertion
in
uct_posix_mem_free
if the calling thread is in a different thread group than the one that created the UCX context.This happens when each MPI process creates a clone of itself, with
CLONE_THREAD
bit OFF in the clone flags, and theMPI_Finalize()
is called from within the child process.I assume that assertion is there for a reason, of which I don't the details.
Steps to Reproduce
mpirun ... --mca pml ucx ...
(OpenMPI version4.1.7a1
) with an MPI application where each process callsclone()
withCLONE_THREAD
flag bit OFF.1.15.0
+ UCX configure flagsSetup and versions