I use parallel I/O with MPI (MPI_File_open) after a process failure, the execution terminates with the following error:
[xxxxxxxxx:02050] mca_sharedfp_sm_file_open: Error, unable to open file for mmap: /tmp/ompi.xxxxxxxxx.1001/pid.2033/1/file2.mpi_cid-3-2050.sm
I have observed, that the job session directory ‘ompi_process_info.job_session_dir’ does not anymore exists and because of that the file can not be created (in file ompi/mca/sharedfp/sm/sharedfp_sm_file_open.c).
I have attached a simple example that hopefully reproduces the error.
Original report by Kai Keller (Bitbucket: kellekai, GitHub: kellekai).
I use parallel I/O with MPI (MPI_File_open) after a process failure, the execution terminates with the following error:
[xxxxxxxxx:02050] mca_sharedfp_sm_file_open: Error, unable to open file for mmap: /tmp/ompi.xxxxxxxxx.1001/pid.2033/1/file2.mpi_cid-3-2050.sm
I have observed, that the job session directory ‘ompi_process_info.job_session_dir’ does not anymore exists and because of that the file can not be created (in file ompi/mca/sharedfp/sm/sharedfp_sm_file_open.c).
I have attached a simple example that hopefully reproduces the error.
i have configured ULFM as:
./configure --with-ft --prefix=/xxxx/opt/ULFM/2.1 --enable-mpi-cxx --enable-cxx-exceptions=yes --enable-debug --enable-mpi-fortran=no
I am on commit: 6c76e287178d42d7dfd1e50e6be4ba18a86a06a1