hpc / mpifileutils

File utilities designed for scalability and performance.
https://hpc.github.io/mpifileutils
BSD 3-Clause "New" or "Revised" License
170 stars 68 forks source link

dcp: MPI permissions issue with `--uid` and `--gid` #585

Closed bdevcich closed 1 month ago

bdevcich commented 1 month ago

Running dcp with --uid and --gid frequently results in this message at the end. This is with openmpi v4.1.0 and running mpirun as root. I understand that running as root is frowned upon, but our current design forces our hand at this so that we can become any user to perform data movement inside of containers.

A full command typically looks something like this:

mpirun --allow-run-as-root -hostfile <hostfile> dcp --xattrs none --progress 1 --uid 1060 --gid 100 <src> <dest>

And here is the message that appears frequently after the dcp output:

# --------------------------------------------------------------------------
# A system call failed during shared memory initialization that should
# not have.  It is likely that your MPI job will now either abort or
# experience performance degradation.
#
#   Local host:  nnf-dm-worker-xj45m
#   System call: unlink(2) /dev/shm/vader_segment.nnf-dm-worker-xj45m.8d600001.7
#   Error:       Operation not permitted (errno 1)
# --------------------------------------------------------------------------

Most of the time, this message appears to be harmless: dcp completes successfully and well as mpirun. However, there are some cases there it can segfault.

I tried to get at the root of the message and as part of that journey, I ended up upgrading to openmpi v4.1.6 to try to see if anything changes. It does and these message now appear with every single invocation of dcp:

    [[16880,0],0] ORTE_ERROR_LOG: Data unpack failed in file util/show_help.c at line
    501\n[nnf-dm-controller-manager-67cbcc74c5-s8bg6:00394] [[16880,0],0] ORTE_ERROR_LOG:
    Data unpack failed in file util/show_help.c at line 501\n[nnf-dm-controller-manager-67cbcc74c5-s8bg6:00394]
    [[16880,0],0] ORTE_ERROR_LOG: Data unpack failed in file util/show_help.c at line
    501\n[nnf-dm-controller-manager-67cbcc74c5-s8bg6:00394] [[16880,0],0] ORTE_ERROR_LOG:
    Data unpack failed in file util/show_help.c at line 501\n[nnf-dm-controller-manager-67cbcc74c5-s8bg6:00394]
    [[16880,0],0] ORTE_ERROR_LOG: Data unpack failed in file util/show_help.c at line
    501\n[nnf-dm-controller-manager-67cbcc74c5-s8bg6:00394] [[16880,0],0] ORTE_ERROR_LOG:
    Data unpack failed in file util/show_help.c at line 501\n[nnf-dm-controller-manager-67cbcc74c5-s8bg6:00394]
    [[16880,0],0] ORTE_ERROR_LOG: Data unpack failed in file util/show_help.c at line
    501\n[nnf-dm-controller-manager-67cbcc74c5-s8bg6:00394] [[16880,0],0] ORTE_ERROR_LOG:
    Data unpack failed in file util/show_help.c at line 501\n"

I noticed that if you drop --uid and --gid, this goes away. That leads me to believe that there are some issues with the combination of mpirun as root and trying to do things in dcp as non-root.

In lieu of using dcp --uid/--gid, I tried to use setpriv before the dcp command:

mpirun --allow-run-as-root -hostfile <hostfile> setpriv --euid 1060 --egid 100 --clear-groups dcp --xattrs none --progress  <src> <dest>

But this results in an error:

   [nnf-dm-worker-jzl9t:00111] PMIX ERROR: UNREACHABLE in file ptl_tcp_component.c at line 1849
   [nnf-dm-worker-jzl9t:00121] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 168
   [nnf-dm-worker-jzl9t:00121] OPAL ERROR: Unreachable in file pmix3x_client.c at line 111
   *** An error occurred in MPI_Init
   *** on a NULL communicator
   *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
   ***    and potentially your MPI job)
   [nnf-dm-worker-jzl9t:00121] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
   --------------------------------------------------------------------------
   mpirun detected that one or more processes exited with non-zero status, thus causing
   the job to be terminated. The first process to do so was:

     Process name: [[22121,1],3]
     Exit code:    1
   --------------------------------------------------------------------------

I think --uid and --gid need to only be used for the file operations and not mpi functionality. Is that possible?