Closed edgargabriel closed 1 month ago
@edgargabriel I would like to run some tests on this PR. Do you recommend a specific setup?
@wenduwan any MPI I/O test on a regular local (or non-Lustre) file system should work
@edgargabriel Thanks unfortunately we don't have MPI i/o test suite 😞 But let me run our normal tests.
The fs/ufs component by default disabled all file locking before read/write operations (except for NFS file systems). This was based on the assumption, that the OS itself performs the required locking operation and hence we don't have to add to it.
This assumption is incorrect when using data sieving. In data sieving, the code 'ignore' small gaps when we write to a file, and perform instead a read-modify-write sequence ourselves for performance reasons. The problem is however that even within a collective operation not all aggregators might want to use data sieving. Hence, enabling locking just for the data-sieving routines is insufficient, all processes have to perform the locking. Therefore, our two options are: a) either disable write data-sieving by default, or b) enable range-locking by default.
After some testing, I think enabling range-locking be default is the safer and better approach. It doesn't seem to show any significant performance impact on my test systems.
Note, that on Lustre file systems, we can keep the default to no-locking as far as I can see, since the collective algorithm used by Lustre is unlikely to produce this pattern. I did add in however an mca parameter that allows us to control the locking algorithm used by the Lustre component as well, in case we need to change that for a particular use-case or platform.
Fixes Issue #12718
Signed-off-by: Edgar Gabriel Edgar.Gabriel@amd.com (cherry picked from commit c697f28d2ca027238016f62da8786b24038578c0)