open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.13k stars 858 forks source link

fs/ufs: change default locking protocol -v5.0.x #12759

Closed edgargabriel closed 1 month ago

edgargabriel commented 1 month ago

The fs/ufs component by default disabled all file locking before read/write operations (except for NFS file systems). This was based on the assumption, that the OS itself performs the required locking operation and hence we don't have to add to it.

This assumption is incorrect when using data sieving. In data sieving, the code 'ignore' small gaps when we write to a file, and perform instead a read-modify-write sequence ourselves for performance reasons. The problem is however that even within a collective operation not all aggregators might want to use data sieving. Hence, enabling locking just for the data-sieving routines is insufficient, all processes have to perform the locking. Therefore, our two options are: a) either disable write data-sieving by default, or b) enable range-locking by default.

After some testing, I think enabling range-locking be default is the safer and better approach. It doesn't seem to show any significant performance impact on my test systems.

Note, that on Lustre file systems, we can keep the default to no-locking as far as I can see, since the collective algorithm used by Lustre is unlikely to produce this pattern. I did add in however an mca parameter that allows us to control the locking algorithm used by the Lustre component as well, in case we need to change that for a particular use-case or platform.

Fixes Issue #12718

Signed-off-by: Edgar Gabriel Edgar.Gabriel@amd.com (cherry picked from commit c697f28d2ca027238016f62da8786b24038578c0)

wenduwan commented 1 month ago

@edgargabriel I would like to run some tests on this PR. Do you recommend a specific setup?

edgargabriel commented 1 month ago

@wenduwan any MPI I/O test on a regular local (or non-Lustre) file system should work

wenduwan commented 1 month ago

@edgargabriel Thanks unfortunately we don't have MPI i/o test suite 😞 But let me run our normal tests.