openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.17k stars 428 forks source link

Thread group ID mismatch assertion failure in `uct_posix_mem_free` #10334

Open miEsMar opened 2 days ago

miEsMar commented 2 days ago

Describe the bug

When using UCX as OpenMPI's pml component, UCX fails in the assertion

        ucs_assert(dummy_pid == getpid());

in uct_posix_mem_free if the calling thread is in a different thread group than the one that created the UCX context.

This happens when each MPI process creates a clone of itself, with CLONE_THREAD bit OFF in the clone flags, and the MPI_Finalize() is called from within the child process.

I assume that assertion is there for a reason, of which I don't the details.

Steps to Reproduce

Setup and versions

miEsMar commented 2 days ago

Small reproducer:

#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

#include <mpi.h>

#define STACK_SIZE 1024

static volatile int state = 0;
static int rank;
static char *stack_ptr = NULL;

static int cb(void *arg)
{
    fprintf(stdout, "Hello, from rank %d child  process!\n", rank);
    MPI_Finalize();
    state = 0;
    return 0;
}

int main(void)
{
    MPI_Comm comm = MPI_COMM_WORLD;

    MPI_Init(NULL, NULL);
    MPI_Comm_rank(comm, &rank);

    fprintf(stdout, "Hello, from rank %d parent process!\n", rank);
    MPI_Barrier(comm);

    stack_ptr = malloc(STACK_SIZE);
    if (NULL == stack_ptr) return 1;
    stack_ptr = stack_ptr + STACK_SIZE;

    const uint64_t clone_flags = 
        CLONE_FILES | CLONE_FS | CLONE_SIGHAND | CLONE_VM;

    state = 1;
    if ((long)-1 == clone(&cb, stack_ptr, clone_flags, NULL)) {
        fprintf(stderr, "Failed to clone process!\n");
        exit(EXIT_FAILURE);
    }

    while (1 == state);

    return 0;
}
brminich commented 2 days ago

According to MPI standard v4.1:

11.6.2 Clarifications
Initialization and Completion. When using the World Model, the call to MPI_FINALIZE
should occur on the same thread that initialized MPI. We call this thread the main thread.
The call should occur only after all process threads have completed their MPI calls, and
have no pending communication or I/O operations.
Rationale. This constraint simplifies implementation. (End of rationale.)

also if you want to use multiple threads with MPI, it should be initialized with threads support (using MPI_Init_thread).

miEsMar commented 2 days ago

Thanks @brminich .

I see. However, the same does not happen if instead of using clone() one uses pthread_create(), which is close to keeping the clone() method but setting the CLONE_THREAD flag bit ON. That's because, like so, the child process/thread is put into the same thread group, while still remaining a different thread from the one that called the initialization.