Open dqwu opened 2 years ago
@dqwu thank you for the bug report. There is a chance that this issue is already resolved, there is a pending pr to fix a problem with removing lockfiles at the end. I hope it makes it into the v4.1.2 release
https://github.com/open-mpi/ompi/pull/10006
The ROMIO environment variable will not have any impact on the OMPIO components, those are two separate implementations of the MPI I/O operations in ompi. The sharedfp/lockedfile component is part of the OMPIO set of frameworks to implement MPI I/O.
Update: We still have the same issue after upgrading to OpenMPI 4.1.2
@jayeshkrishna yes, the fix didn't make it into v4.1.2, but is part of v4.1.3 which will probably be released later this week.
@edgargabriel Looks like openmpi/4.1.3 didn't fully fix the issue on E3SM machine Chrysalis with lock files during read. E3SM developers had deleted all lock files in inputdata yesterday. This morning they were back.
It seems that the lock files for read are not deleted if an opening file call failed:
ierr = pio_openfile(pio_subsystem, file, pio_iotype, fname, mode) // This calls PnetCDF open file API, which calls some MPI-IO APIs
Do you know possible workarounds to avoid these lock files even when some openmpi calls might fail?
FYI, below are how we configure OpenMPI 4.1.3 on Chrysalis.
$ /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/openmpi-4.1.3-pin4k7o/bin/ompi_info
Package: Open MPI svcbuilder@chrlogin1.lcrc.anl.gov
Distribution
Open MPI: 4.1.3
Open MPI repo revision: v4.1.3
Open MPI release date: Mar 31, 2022
Open RTE: 4.1.3
Open RTE repo revision: v4.1.3
Open RTE release date: Mar 31, 2022
OPAL: 4.1.3
OPAL repo revision: v4.1.3
OPAL release date: Mar 31, 2022
MPI API: 3.1.0
Ident string: 4.1.3
Prefix: /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/openmpi-4.1.3-pin4k7o
Configured architecture: x86_64-pc-linux-gnu
Configure host: chrlogin1.lcrc.anl.gov
Configured by: svcbuilder
Configured on: Thu Apr 14 19:50:43 UTC 2022
Configure host: chrlogin1.lcrc.anl.gov
Configure command line: '--prefix=/gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/openmpi-4.1.3-pin4k7o'
'--enable-shared' '--disable-silent-rules'
'--enable-mpi1-compatibility'
'--with-platform=contrib/platform/mellanox/optimized'
'--disable-builtin-atomics' '--with-pmi=/usr'
'--enable-static'
'--with-zlib=/gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/zlib-1.2.11-dudhhig'
'--enable-mpi1-compatibility' '--without-psm'
'--without-fca' '--without-cma'
'--with-knem=/opt/knem-1.1.4.90mlnx1'
'--without-mxm' '--without-ofi' '--without-psm2'
'--with-hcoll=/opt/mellanox/hcoll'
'--without-xpmem' '--without-verbs'
'--with-ucx=/usr' '--with-slurm' '--without-lsf'
'--without-alps' '--without-loadleveler'
'--without-sge' '--without-tm'
'--disable-memchecker'
'--with-hwloc=/gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/hwloc-2.4.1-22xfxgi'
'--disable-java' '--disable-mpi-java'
'--without-cuda' '--enable-wrapper-rpath'
'--disable-wrapper-runpath' '--enable-mpi-cxx'
'--disable-cxx-exceptions'
'--with-wrapper-ldflags=-Wl,-rpath,/gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/int
el-20.0.4-kodw73g/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin'
Note '--with-platform=contrib/platform/mellanox/optimized'. Our Mellanox HPCX version is v2.8.0
you could try to set --mca sharedfp ^lockedfile.
yes, if something goes wrong (e.g. code is crashing), the lockfiles will not be cleaned up, and I am not sure that I am aware of an easy solution for this. I will try to think about it.
@edgargabriel
Thanks for the suggestion. Setting the following ENV variable should also work, right?
export OMPI_MCA_sharedfp=^lockedfile
I have tested the above setting with the test case mentioned in #10297 No .lock file generated but there are hundreds files named xxx.nc.data.xxx and xxx.nc.metadata.xxx generated when the write failed. Is this expected result?
Update: I changed that test case to replay with less variables such that it can pass. The temp xxx.nc.data.xxx and xxx.nc.metadata.xxx files were still generated, but they were all deleted by Open MPI after file close (the write did not fail).
Do you have similar options for Open MPI to disable these data and metadata files?
@dqwu yes, the environment variable is equivalent to the runtime parameter. Hm, I did not expect the individual component to kick-in in this case, but it looks like it has. Try to exclude both, the lockedfile and the individual component,e.g.
export OMPI_MCA_sharedfp=^lockedfile,individual
@edgargabriel "export OMPI_MCA_sharedfp=^lockedfile,individual" seems to work, thanks.
What version of Open MPI are you using?
v4.1.1
Describe how Open MPI was installed
spack installation
Please describe the system on which you are running
Details of the problem
This issue occurs at a machine used by E3SM (e3sm.org) https://e3sm.org/model/running-e3sm/supported-machines/chrysalis-anl
The file system is GPFS. Multiple .loc files associated with the same NetCDF input file were generated from different users within a 12-min window.
We also saw some .locktest files generated, such as cami_mam3_Linoz_ne30np4_L72_c160214.nc.locktest.0 Most likely a race condition, as this issue is not always reproducible.
More information
modules used: intel/20.0.4-kodw73g intel-mkl/2020.4.304-g2qaxzf openmpi/4.1.1-qiqkjbu parallel-netcdf/1.11.0-go65een The tests were run with 1792 MPI tasks, 28 nodes (64 tasks per node). The parallel read code calls ncmpi_begin_indep_data() API of PnetCDF lib, which calls MPI_File_open() API of OpenMPI lib with a error code returned.
1536: MPI error (MPI_File_open) : MPI_ERR_OTHER: known error not in list
It has been confirmed that these lock files are created by OpenMPI code:
As a workaround, E3SM developers have set the input directory /lcrc/group/e3sm/data/inputdata/atm/cam/inic/homme to be read-only. However, the similar issue occurred on another directory (/lcrc/group/e3sm/data/inputdata/atm/cam/topo) which is still writable.
Questions
Do you have some suggestions for this issue? Since the file system is GPFS, do you think setting ROMIO_GPFS_FREE_LOCKS ENV variable works?