open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.07k stars 844 forks source link

cannot MPI_File_open a one-character filename, deletes external file anyways #12619

Closed jeffhammond closed 1 day ago

jeffhammond commented 3 weeks ago

I have a trivial MPI_F08 program that opens, closes and deletes a file. When the filename is "a", this fails. When the filename is "aa", it succeeds. When I create files "a" and "aa" using touch, the program still fails, but Open MPI deletes both files, despite saying that MPI_File_delete on "a" has failed.

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

                 Package: Open MPI jehammond@oppenheimer Distribution
                Open MPI: 5.1.0a1
  Open MPI repo revision: v2.x-dev-11448-g55c0bda957
   Open MPI release date: Unreleased developer copy
                 MPI API: 3.1.0
            Ident string: 5.1.0a1
                  Prefix: /opt/ompi/llvm
 Configured architecture: x86_64-pc-linux-gnu
           Configured by: jehammond
           Configured on: Fri Jun 14 06:34:47 UTC 2024
          Configure host: oppenheimer
  Configure command line: 'FC=/opt/llvm/latest/bin/flang-new'
                          'CC=/opt/llvm/latest/bin/clang'
                          'CXX=/opt/llvm/latest/bin/clang++'
                          '--enable-fortran=all' '--prefix=/opt/ompi/llvm'
                          'CPPFLAGS=-I/opt/llvm/latest/include/flang'
                Built by: jehammond
                Built on: pe 14.6.2024 06.48.03 +0000
              Built host: oppenheimer
              C bindings: yes
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the /opt/llvm/latest/bin/flang-new
                          compiler and/or Open MPI, does not support the
                          following: array subsections, direct passthru
                          (where possible) to underlying Open MPI's C
                          functionality
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: /opt/llvm/latest/bin/clang
     C compiler absolute: /opt/llvm/latest/bin/clang
  C compiler family name: CLANG
      C compiler version: 19.0.0git (https://github.com/llvm/llvm-project.git
                          f2d215f572affc9ad73da07763ce1831de7f2d4d)
            C++ compiler: /opt/llvm/latest/bin/clang++
   C++ compiler absolute: /opt/llvm/latest/bin/clang++
           Fort compiler: /opt/llvm/latest/bin/flang-new
       Fort compiler abs: /opt/llvm/latest/bin/flang-new
         Fort ignore TKR: yes (!DIR$ IGNORE_TKR)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, Event lib: yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
          MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat
 Fault Tolerance support: yes
          FT MPI support: yes
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
         MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.1.0)
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.1.0)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.1.0)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                 MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: smcuda (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.1.0)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.1.0)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.1.0)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.1.0)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v5.1.0)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.1.0)
              MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v5.1.0)
              MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v5.1.0)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.1.0)
             MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.1.0)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                 MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.1.0)
                MCA coll: accelerator (MCA v2.1.0, API v3.0.0, Component
                          v5.1.0)
                MCA coll: adapt (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: basic (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: han (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: inter (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: libnbc (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: self (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: sync (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: tuned (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: xhc (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: ftagree (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA coll: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v5.1.0)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA fcoll: dynamic (MCA v2.1.0, API v3.0.0, Component v5.1.0)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v3.0.0, Component
                          v5.1.0)
               MCA fcoll: individual (MCA v2.1.0, API v3.0.0, Component
                          v5.1.0)
               MCA fcoll: vulcan (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
                          v5.1.0)
                  MCA io: ompio (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                  MCA io: romio341 (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                  MCA op: avx (MCA v2.1.0, API v1.0.0, Component v5.1.0)
                 MCA osc: sm (MCA v2.1.0, API v4.0.0, Component v5.1.0)
                 MCA osc: monitoring (MCA v2.1.0, API v4.0.0, Component
                          v5.1.0)
                 MCA osc: rdma (MCA v2.1.0, API v4.0.0, Component v5.1.0)
                MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.1.0)
                 MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.1.0)
                 MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component
                          v5.1.0)
                 MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.1.0)
                 MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.1.0)
            MCA sharedfp: individual (MCA v2.1.0, API v3.0.0, Component
                          v5.1.0)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v3.0.0, Component
                          v5.1.0)
            MCA sharedfp: sm (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.1.0)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v5.1.0)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

See above.

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

 e32e0179bc6bd1637f92690511ce6091719fa046 3rd-party/openpmix (v1.1.3-4036-ge32e0179)
 1d867e84981077bffda9ad9d44ff415a3f6d91c4 3rd-party/prrte (psrvr-v2.0.0rc1-4783-g1d867e8498)
 dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (remotes/origin/HEAD)

Please describe the system on which you are running

Linux oppenheimer 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May  7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
AMD Ryzen 9 7950X 16-Core Processor

Details of the problem

$ make test_file.x && ls a aa ; echo "====" ; /opt/ompi/llvm/bin/mpirun -n 1 ./test_file.x ; echo "====" ; touch a aa ; ls a aa ; echo "====" ; /opt/ompi/llvm/bin/mpirun -n 1 ./test_file.x ; echo "====" ; ls a aa
make: 'test_file.x' is up to date.
ls: cannot access 'a': No such file or directory
ls: cannot access 'aa': No such file or directory
====
 I am  0  of  1  of WORLD
 filename = "a        "
 open failed
 why? MPI_ERR_OTHER: known error not in list

 close failed
 why? MPI_ERR_FILE: invalid file

 delete failed
 why? MPI_ERR_FILE: invalid file

 filename = "aa       "
 done
====
a  aa
====
 I am  0  of  1  of WORLD
 filename = "a        "
 open failed
 why? MPI_ERR_OTHER: known error not in list

 close failed
 why? MPI_ERR_FILE: invalid file

 delete failed
 why? MPI_ERR_FILE: invalid file

 filename = "aa       "
 done
====
ls: cannot access 'a': No such file or directory
ls: cannot access 'aa': No such file or directory
program main
    use mpi_f08
    implicit none
    integer :: ierror, slen
    integer :: me, np
    integer :: amode
    type(MPI_File) :: f
    character(len=9) :: filename
    character(len=MPI_MAX_ERROR_STRING) :: string

    call MPI_Init(ierror)

    call MPI_Comm_rank(MPI_COMM_WORLD,me)
    call MPI_Comm_size(MPI_COMM_WORLD,np)
    print*,'I am ',me,' of ',np,' of WORLD'

    filename = "a"
    print*,'filename = "',filename,'"'
    amode = IOR( MPI_MODE_CREATE, MPI_MODE_RDWR )

    call MPI_File_open(MPI_COMM_SELF,filename,amode,MPI_INFO_NULL,f,ierror)
    if (ierror.ne.MPI_SUCCESS) then
        print*,'open failed'
        call MPI_Error_string(ierror, string, slen)
        print*,'why? ',trim(string)
    endif

    call MPI_File_close(f,ierror)
    if (ierror.ne.MPI_SUCCESS) then
        print*,'close failed'
        call MPI_Error_string(ierror, string, slen)
        print*,'why? ',trim(string)
    endif

    call MPI_File_delete(filename,MPI_INFO_NULL)
    if (ierror.ne.MPI_SUCCESS) then
        print*,'delete failed'
        call MPI_Error_string(ierror, string, slen)
        print*,'why? ',trim(string)
    endif

    filename = "aa"
    print*,'filename = "',filename,'"'
    amode = IOR( MPI_MODE_CREATE, MPI_MODE_RDWR )

    call MPI_File_open(MPI_COMM_SELF,filename,amode,MPI_INFO_NULL,f,ierror)
    if (ierror.ne.MPI_SUCCESS) then
        print*,'open failed'
        call MPI_Error_string(ierror, string, slen)
        print*,'why? ',trim(string)
    endif

    call MPI_File_close(f,ierror)
    if (ierror.ne.MPI_SUCCESS) then
        print*,'close failed'
        call MPI_Error_string(ierror, string, slen)
        print*,'why? ',trim(string)
    endif

    call MPI_File_delete(filename,MPI_INFO_NULL)
    if (ierror.ne.MPI_SUCCESS) then
        print*,'delete failed'
        call MPI_Error_string(ierror, string, slen)
        print*,'why? ',trim(string)
    endif

    print*,'done'
    call MPI_Finalize(ierror)

end program main
ggouaillardet commented 3 weeks ago

Thanks Jeff for the report, I will have a look.

I am able to reproduce the issue (with GNU compilers fwiw) Fun fact: no error if i run in singleton mode and/or use romio

$ mpirun -np 1 ./d
 I am            0  of            1  of WORLD
 filename = "a        "
 open failed
 why? MPI_ERR_OTHER: known error not in list
 close failed
 why? MPI_ERR_FILE: invalid file
 delete failed
 why? MPI_ERR_FILE: invalid file
 filename = "aa       "
 done

$ mpirun -np 1 --mca io ^ompio ./d
 I am            0  of            1  of WORLD
 filename = "a        "
 filename = "aa       "
 done

$ ./d
 I am            0  of            1  of WORLD
 filename = "a        "
 filename = "aa       "
 done
ggouaillardet commented 3 weeks ago

singleton vs mpirun was fun but unrelated to the root cause.

here is a patch (opal_basename() does not correctly handle single character filename !), I will issue a PR later

diff --git a/opal/util/basename.c b/opal/util/basename.c
index 0a57b07078..ad873f2c7c 100644
--- a/opal/util/basename.c
+++ b/opal/util/basename.c
@@ -77,16 +77,18 @@ char *opal_basename(const char *filename)

     /* Remove trailing sep's (note that we already know that strlen > 0) */
     tmp = strdup(filename);
-    for (i = strlen(tmp) - 1; i > 0; --i) {
-        if (sep == tmp[i]) {
-            tmp[i] = '\0';
-        } else {
-            break;
+    if (1 < strlen(tmp)) {
+        for (i = strlen(tmp) - 1; i > 0; --i) {
+            if (sep == tmp[i]) {
+                tmp[i] = '\0';
+            } else {
+                break;
+            }
+        }
+        if (0 == i) {
+            tmp[0] = sep;
+            return tmp;
         }
-    }
-    if (0 == i) {
-        tmp[0] = sep;
-        return tmp;
     }

     /* Look for the final sep */
edgargabriel commented 2 weeks ago

@ggouaillardet thank you for identifying the issue, can you file a PR with the fix?

ggouaillardet commented 2 weeks ago

Sorry for the delay, I just issued #12632

wenduwan commented 1 day ago

Should be fixed in 5.0.4 scheduled in 7/2024