Open cniethammer opened 7 years ago
Christoph,
thank you for your bug report and analysis. I think the second solution is already implemented, at least for blocking operations, it is just not active by default. Could you try to rerun your test case and set the mca parameter
mca_io_ompio_cycle_buffer_size
to something (e.g. 1 GB or similar, value has to be given in bytes however), and see whether this fixes the problem? If this solution works, we might just have to change the default value for this mca parameter.
THis would not solve the issue for non-blocking operations. We do have the logic for splitting the write sequences into multiple chunks for the aio_write function calls in the posix fbtl already in the code, just in a slightly different context. I will look into that part as well.
Also as a side note, I am fairly confident that the problem should not arise for collective write operations, since they are broken down into cycles of e.g. 32MB anyway.
Thanks
Edgar
On 11/11/2016 3:58 AM, cniethammer wrote:
There is a bug with MPI_File_write* operations in Open MPI which truncates files or writes incorrect data to files. I broke the problem down to the POSIX writev call used in ompi/mca/fbtl/posix/fbtl_posix_pwritev.c that will not write more than 2,147,479,552 bytes to disk. From man 2 write NOTES:
On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.) We do not catch the case when the iov contains elements with iov_len > 2,147,479,552 at the moment which causes this issues with large I/O blocks and files.
My suggestions: Either (1) check for writes completing with the correct number of written bytes and adding another write or (2) modify the convertor used in mca_io_ompio_build_io_array/ompi_io_ompio_decode_datatype to build an iov struct with elements not larger than 2,147,479,552 bytes when using POSIX I/O on Linux systems.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/open-mpi/ompi/issues/2399, or mute the thread https://github.com/notifications/unsubscribe-auth/AH22HibitjylZCy1K-zKXowC12PmD0dVks5q9Dw7gaJpZM4KvmAC.
Hello Edgar,
Thanks for pointing out this option. I can confirm that setting -mca mca_io_ompio_cycle_buffer_size 1073741824 results in writing the correct amount of data to disk. I welcome setting a default which is safe for the user - e.g. to the Linux write limit - accepting a possible performance loss, which is obviously better than losing data at this point.
Best Christoph
Christoph, I am open to any reasonable value, can you make a suggestion? Thanks Edgar
@edgargabriel @cniethammer If you guys want this in v2.0.2, please come to a decision ASAP (i.e., today/tomorrow). Otherwise we're going to have to push this to v2.0.3.
this is the bug fix that was merged a couple of days back with the default value for the parameter. So it is done for now, I will work on the second solution that we discussed over the break.
Ah! Ok, my bad -- I should have read closer / remembered better. 😊
@edgargabriel @cniethammer It's been about 2 months since the last update -- any progress? The door for v2.1.0 is just about closed. I'm going to pre-emptively move this to v2.1.1; feel free to move to v3.0.0 if you think that is more realistic.
feel free to move it to 3.0, I will not get to it. The fix that we committed for the 2.0 series (and 2.1, but will double check that), should do the trick for now.
@edgargabriel Done; thanks.
@edgargabriel should we just move this to future or do you think you can get a fix in to 3.0.1?
@hppritcha it is on my to do list for this summer, but I don't think it will make it for 3.0.1, I hope to have it ready for 3.1 (or 4.0 whatever the next release is).
There is a bug with MPI_File_write* operations in Open MPI which truncates files or writes incorrect data to files. I broke the problem down to the POSIX writev call used in ompi/mca/fbtl/posix/fbtl_posix_pwritev.c that will not write more than 2,147,479,552 bytes to disk. From man 2 write NOTES:
My suggestions: Either (1) check for writes completing with the correct number of written bytes and adding another write or (2) modify the convertor used in mca_io_ompio_build_io_array/ompi_io_ompio_decode_datatype to build an iov struct with elements not larger than 2,147,479,552 bytes when using POSIX I/O on Linux systems.