open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.18k stars 861 forks source link

The simple hello-world.c MPI program prints: shmem: mmap: an error occurred while determining whether or not /tmp/ompi.yv.1001/jf.0/3074883584/sm_segment.yv.1001.b7470000.0 could be created #12784

Open yurivict opened 2 months ago

yurivict commented 2 months ago

See the program below.

$ ./hello-world-1 
[xx.xx.xx:12584] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.yv.1001/jf.0/3074883584/sm_segment.yv.1001.b7470000.0 could be created.

---program---

$ cat hello-world-1.c 
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("> Hello world from processor %s, rank %d out of %d processors (pid=%d)\n",
           processor_name, world_rank, world_size, getpid());
    sleep(1);
    printf("< Hello world from processor %s, rank %d out of %d processors (pid=%d)\n",
           processor_name, world_rank, world_size, getpid());

    // Finalize the MPI environment.
    MPI_Finalize();
}

Version: openmpi-5.0.5_1 Describe how Open MPI was installed: FreeBSD package Computer hardware: Intel CPU Network type: Ethernet/IP (irrelevant) Available space in /tmp: 64GB FreeBSD 14.1

jsquyres commented 2 months ago

Please provide all the information from the debug issue template; thanks!

https://github.com/open-mpi/ompi/blob/main/.github/ISSUE_TEMPLATE/bug_report.md

yurivict commented 2 months ago

I added missing bits of information.

ggouaillardet commented 2 months ago

the root cause could be not enough available space in /tmp (unlikely per your description) or something went wrong when checking the size.

try running

env OMPI_MCA_shmem_base_verbose=100 ./hello-world-1

and check the output (useful message might have been compiled out though)

if there is nothing useful, you can

strace -o hw.strace -s 512 ./hello-world-1

then compress hw.strace and upload it.

yurivict commented 2 months ago

env OMPI_MCA_shmem_base_verbose=100 ./hello-world-1

This didn't produce anything relevant.

strace -o hw.strace -s 512 ./hello-world-1

BSDs have ktrace instead. Here is the ktrace dump: https://freebsd.org/~yuri/openmpi-kernel-dump.txt

ggouaillardet commented 2 months ago
51253 hello-world-1 CALL  fstatat(AT_FDCWD,0x1b0135402080,0x4c316d20,0)
 51253 hello-world-1 NAMI  "/tmp/ompi.yv.0/jf.0/2909405184"
 51253 hello-world-1 RET   fstatat -1 errno 2 No such file or directory
 51253 hello-world-1 CALL  open(0x1b0135402080,0x120004<O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC>)
 51253 hello-world-1 NAMI  "/tmp/ompi.yv.0/jf.0/2909405184"
 51253 hello-world-1 RET   open -1 errno 2 No such file or directory

It looks like some directories were not created. what if you mpirun -np 1 ./hello-world-1 instead?

yurivict commented 2 months ago

sudo mpirun -np 1 ./hello-world-1 prints the same error message:

It appears as if there is not enough space for /dev/shm/sm_segment.yv.0.9f060000.0 (the shared-memory backing
file). It is likely that your MPI job will now either abort or experience
performance degradation.

The log doesn't have any mkdir operations, so that "/tmp/ompi.yv.0" was never created.

ggouaillardet commented 2 months ago

well, this is a different message that the one used when opening this issue. And this one is self explanatory.

Anyway, what if you

env OMPI_MCA_shmem_mmap_backing_file_base_dir=/tmp ./helloworld-1

or you can simply increase the size of /dev/shm

yurivict commented 2 months ago

sudo OMPI_MCA_shmem_mmap_backing_file_base_dir=/tmp ./hello-world-1 produces the same error messages.

This message is for a regular user:

$ OMPI_MCA_shmem_mmap_backing_file_base_dir=/tmp ./hello-world-1
[yv.noip.me:88431] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.yv.1001/jf.0/1653407744/sm_segment.yv.1001.628d0000.0 could be created.
> Hello world from processor yv.noip.me, rank 0 out of 1 processors (pid=88431)
< Hello world from processor yv.noip.me, rank 0 out of 1 processors (pid=88431)

This message is for root:

# OMPI_MCA_shmem_mmap_backing_file_base_dir=/tmp ./hello-world-1
--------------------------------------------------------------------------
It appears as if there is not enough space for /dev/shm/sm_segment.yv.0.ee540000.0 (the shared-memory backing
file). It is likely that your MPI job will now either abort or experience
performance degradation.

  Local host:  yv
  Space Requested: 16777216 B
  Space Available: 1024 B
--------------------------------------------------------------------------
> Hello world from processor yv.noip.me, rank 0 out of 1 processors (pid=88929)
< Hello world from processor yv.noip.me, rank 0 out of 1 processors (pid=88929)
ggouaillardet commented 2 months ago

I see.

try adding OMPI_MCA_btl_sm_backing_directory=/tmp and see how it works

yurivict commented 2 months ago

The error messages disappear when OMPI_MCA_btl_sm_backing_directory=/tmp is used.

rhc54 commented 2 months ago

We have seen and responded to this problem many times - I believe it is included in the docs somewhere. The problem is that BSD (mostly as seen on Mac) has created a default TMPDIR that is incredibly long. So when we add our tmpdir prefix (to avoid stepping on other people's tmp), the result is longer than the path length limits.

Solution: set TMPDIR in your environment to point to some shorter path, typically something like $HOME/tmp.

yurivict commented 2 months ago

[...] a default TMPDIR that is incredibly long [...]

What do you mean by TMPDIR? In our case TMPDIR is just /tmp.

ggouaillardet commented 2 months ago

Indeed, it seems the root cause is something fishy related to /dev/shm

what if you df -h /dev/shm both as a user and root?

yurivict commented 2 months ago
$ df -h /dev/shm
Filesystem    Size    Used   Avail Capacity  Mounted on
devfs         1.0K      0B    1.0K     0%    /dev
# df -h /dev/shm
Filesystem    Size    Used   Avail Capacity  Mounted on
devfs         1.0K      0B    1.0K     0%    /dev
ggouaillardet commented 2 months ago

That's indeed a small /dev/shm.

I still do not understand why running as a user does not get you the user friendly message you get when running as root.

can you ktrace as a non-root user so we can figure out where the failure occurs?

yurivict commented 2 months ago

Here is the ktrace dump for a regular user.

ggouaillardet commented 2 months ago

It seems regular users do not have write access to the (small size) /dev/shm and we do not display a friendly error message about it.

45163 hello-world-1 CALL  access(0x4e3d8d33,0x2<W_OK>)
 45163 hello-world-1 NAMI  "/dev/shm"
 45163 hello-world-1 RET   access -1 errno 13 Permission denied

Unless you change that, your best bet is probably to add

btl_sm_backing_directory=/tmp

to your $PREFIX/etc/openmpi-mca-params.conf

yurivict commented 2 months ago

Is direct access to /dev/shm new in OpenMPI? It used to work fine on FreeBSD.

How does this work on Linux? Is everybody allowed write access to /dev/shm there?

yurivict commented 2 months ago

Access to /dev/shm has fallback in ompi, like here.

Why doesn't this fallback work then? Is it accidentally missing in some cases?

jabowery commented 3 weeks ago

I believe I've tried everything suggested (and then some) as evidenced by the following interactions:

(ioniser) jabowery@jaboweryML:~/devel/ioniser$ printenv |grep BulkData|grep tmp
OMPI_MCA_shmem_mmap_backing_file_base_dir=/mnt/BulkData/home/jabowery/tmp
btl_sm_backing_directory=/mnt/BulkData/home/jabowery/tmp
TMPDIR=/mnt/BulkData/home/jabowery/tmp
(ioniser) jabowery@jaboweryML:~/devel/ioniser$ tail /home/jabowery/mambaforge/envs/ioniser/etc/openmpi-mca-params.conf

# See "ompi_info --param all all --level 9" for a full listing of Open
# MPI MCA parameters available and their default values.
pml = ^ucx
osc = ^ucx
coll_ucc_enable = 0
mca_base_component_show_load_errors = 0
opal_warn_on_missing_libcuda = 0
opal_cuda_support = 0
btl_sm_backing_directory=/mnt/BulkData/home/jabowery/tmp
(ioniser) jabowery@jaboweryML:~/devel/ioniser$ tail /etc/openmpi/openmpi-mca-params.conf 
btl_base_warn_component_unused=0
# Avoid openib an in case applications use fork: see https://github.com/ofiwg/libfabric/issues/6332
# If you wish to use openib and know your application is safe, remove the following:
# Similarly for UCX: https://github.com/open-mpi/ompi/issues/8367
mtl = ^ofi
btl = ^uct,openib,ofi
pml = ^ucx
osc = ^ucx,pt2pt
btl_sm_backing_directory=/mnt/BulkData/home/jabowery/tmp
(ioniser) jabowery@jaboweryML:~/devel/ioniser$ !p
p ioniser.py
[jaboweryML:34571] shmem: mmap: an error occurred while determining whether or not /mnt/BulkData/home/jabowery/tmp/ompi.jaboweryML.1000/jf.0/121765888/shared_mem_cuda_pool.jaboweryML could be created.
[jaboweryML:34571] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728 
(ioniser) jabowery@jaboweryML:~/devel/ioniser$ whoami
jabowery
ioniser) jabowery@jaboweryML:~/devel/ioniser$ touch /mnt/BulkData/home/jabowery/tmp/accesstest.txt
(ioniser) jabowery@jaboweryML:~/devel/ioniser$ ls -altr /mnt/BulkData/home/jabowery/tmp/accesstest.txt
-rw-rw-r-- 1 jabowery jabowery 0 Nov  1 10:51 /mnt/BulkData/home/jabowery/tmp/accesstest.txt
(ioniser) jabowery@jaboweryML:~/devel/ioniser$ df /mnt/BulkData/home/jabowery/tmp
Filesystem      1K-blocks      Used  Available Use% Mounted on
/dev/nvme1n1   1921725720 692366840 1131666768  38% /mnt/BulkData
(ioniser) jabowery@jaboweryML:~/devel/ioniser$