mpi-forum / mpi-forum-historic

Migration of old MPI Forum Trac Tickets to GitHub. New issues belong on mpi-forum/mpi-issues.
http://www.mpi-forum.org
2 stars 3 forks source link

Add non blocking File Manipulation Routines #285

Open mpiforumbot opened 8 years ago

mpiforumbot commented 8 years ago

Originally by chaarawi on 2011-07-20 11:39:59 -0500


This proposal adds routines to support nonblocking file manipulation to the standard. All those functions are considered expensive and it's valuable for HPC applications to have those function run in the background.

Extended Scope

N/A

History

This proposal was done in junction with the non-blocking collective (read/write) I/O ticket.

https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/273

There was some discussion on how this would be implemented with the non existence of posix routines for nonblocking file manipulation. Thread support will be needed.

Proposed Solution

We propose to extend the MPI standard to add the following non blocking file manipulation routines:

MPI_File_iopen (MPI_Comm comm, char* filename, int amode, MPI_Info info, MPI_File fh, MPI_Request req);

MPI_File_iclose (MPI_File fh, MPI_Request *req);

MPI_File_isync (MPI_File file, MPI_Request *req);

MPI_File_iset_view (MPI_File fh, MPI_Offset disp, MPI_Datatype etype, MPI_Datatype filetype, char datarep, MPI_Info info, MPI_Request req);

MPI_File_iset_size (MPI_File fh, MPI_Offset size, MPI_Request *req);

MPI_File_iset_info (MPI_File fh, MPI_Info info, MPI_Request req);

MPI_File_ipreallocate (MPI_File fh, MPI_Offset size, MPI_Request *req);

The interface is the same as their blocking counterpart, with a request added as an extra parameter.

Impact on Implementations

With the non-existence of posix asynchronous file manipulation routines, implementation of those functions will be complicated. Implementers might have to assume the presence of thread support to be able to have progress of those functions in the background.

Impact on Applications / Users

This is extra functionality so there is no impact on users if they don't use them. Benefits of using them can be very substantial with a good implementation.

Alternative Solutions

Entry for the Change Log

Added MPI_File_iopen[[BR]] Added MPI_File_iclose[[BR]] Added MPI_File_isync[[BR]] Added MPI_File_iset_view[[BR]] Added MPI_File_iset_size[[BR]] Added MPI_File_iset_info[[BR]] Added MPI_File_ipreallocate[[BR]]

mpiforumbot commented 8 years ago

Originally by rsthakur on 2011-10-10 15:39:03 -0500


MPI_File_iset_view needs to specify what is the file view between the call and the test/wait. Does the file view remain the same as the old one until a successful test/wait? What file view is used by nonblocking read/write calls that are initiated after MPI_File_iset_view but before the test/wait that completes iset_view, and which complete after the iset_view completes.

MPI_File_iset_view
MPI_File_iwrite_all
MPI_Wait /* completes iset_view */
...
MPI_Wait /* completes iwrite

Also, MPI_File_Iget_size needs a motivation.

mpiforumbot commented 8 years ago

Originally by chaarawi on 2011-10-10 16:28:11 -0500


thanks for pointing that out.. We can go 2 ways here:

1) as you mentioned, using the old file view 2) point out that this would be an undefined behavior. the user is responsible for any file corruption in that case.

I would tend to go with option 2, because if we look at non-blocking communication, let's say MPI_Irecv, the user should not access the buffer before the operation completes (test/wait). This doesn't mean that he can't actually do it.

I don't see why a user would call a non-blocking set view followed by a read/write without waiting for the set view function to complete, similar to the point that a user would (well should) not access a buffer that is used in an MPI_Irecv before the receive completes.

Makes sense?

As for MPI_File_iget_size, the motivation behind that is that this operations involves stating a file or an lseek to the end of the file to get the file size. This is an expensive operation. We don't really have an application use case for this operation, but at the last Chicago meeting, there was a suggestion to add that to the list. Let me think more about it..

mpiforumbot commented 8 years ago

Originally by dries on 2011-10-11 18:31:40 -0500


Note that the POSIX 1b realtime extensions describe asynchronous I/O functionality. (aio_xxxx functions)

mpiforumbot commented 8 years ago

Originally by rsthakur on 2011-10-12 07:44:09 -0500


The POSIX aio functions are for reads and writes, right? Do any file systems support the nonblocking file manipulation functions?

mpiforumbot commented 8 years ago

Originally by chaarawi on 2011-10-12 08:15:02 -0500


There is no nonblocking open/set_view/etc.. in the posix standard, just reads and writes whose behavior are similar to MPI individual nonblocking read/write operations.

I'm not aware of any file systems that provide them at the moment. For a prototype implementation, I would guess threads is the only way to go.

mpiforumbot commented 8 years ago

Originally by chaarawi on 2011-10-12 14:14:25 -0500


Attachment added: draft_nbfile.pdf (441.8 KiB) Draft for formal reading

mpiforumbot commented 8 years ago

Originally by gropp on 2011-10-18 09:16:18 -0500


The original reasoning behind not including nonblocking routines in MPI 2 was that it is easy to create a nonblocking routine by creating a thread and putting the routine within the thread; the MPI rules on threads ensures that this works. For all but performance critical routines (e.g., MPI_Isend or, with less justification, MPI_File_iread), there's little reason to add routines for this purpose. The argument that some current systems do not support multiple threads per core is increasingly weak.

mpiforumbot commented 8 years ago

Originally by chaarawi on 2011-10-18 10:20:09 -0500


Replying to gropp:

The original reasoning behind not including nonblocking routines in MPI 2 was that it is easy to create a nonblocking routine by creating a thread and putting the routine within the thread; the MPI rules on threads ensures that this works. For all but performance critical routines (e.g., MPI_Isend or, with less justification, MPI_File_iread), there's little reason to add routines for this purpose. The argument that some current systems do not support multiple threads per core is increasingly weak.

I agree with that, but wouldn't those routines be good to start pushing for some real nonblocking I/O behavior underneath? The purpose of this is not to just use threads to implement those routines, however this is the only way it can be done now. Having those routines would start pushing file system implementations to implement, for example, a nonblocking open or sync at the file system level.

But for now, wouldn't it be much easier for users to rely on MPI to accomplish this with threads underneath? With complete nonblocking I/O functionality, the MPI library would record dependencies between operations (directed graphs, or schedules) and progress accordingly. This brings us to another form of consistency semantics we would like to propose/get opinions on, which is having atomic and "ordered" access to file, where nonblocking io operations are done in order of the way they are called. This extra "ordered" functionality needs to be addressed in another ticket though.

mpiforumbot commented 8 years ago

Originally by dries on 2011-10-24 12:31:47 -0500


Note that non-blocking (in MPI terms) has a very specific meaning which might not be what you think it is.

Are these routines going to be usable for you if they 'block' for a significant (but finite) amount of time due to the filesystem (but not due to the collective nature of the functions)?

For an implementation without threads, calling any of these functions might end up 'blocking' the caller for the same amount of time the blocking version of the routine would have.

Also, please add the slides describing the use case for these routines to this ticket for those of us not present at the forum meeting...

mpiforumbot commented 8 years ago

Originally by chaarawi on 2011-10-24 13:43:42 -0500


Replying to dries:

Note that non-blocking (in MPI terms) has a very specific meaning which might not be what you think it is.

I understand what you are implying.. probably what we are looking for here is asynchronous.

Are these routines going to be usable for you if they 'block' for a significant (but finite) amount of time due to the filesystem (but not due to the collective nature of the functions)?

If I understood your comment, you meant by "due to filesystem" that the operation blocks, for example, for the underlying open call to return. In that case, this is not entirely what we want. It is the case without thread support, because there is no underlying asynchronous open on the filesystem level.

It is ofcourse useful if the collective overhead of those functions can be reduced.

For an implementation without threads, calling any of these functions might end up 'blocking' the caller for the same amount of time the blocking version of the routine would have.

yup that is expected for now. but maybe some time in the future there could be filesystems with asynchronous IO functions (open,close,sync..) ? Another question, to you in particular, maybe I/O forwarding could be helpful in this case?

In the end, we are trying to achieve somewhat of a complete asynchronous I/O step (open, access, close), where we post all the I/O operations required, and return to computation, and let the I/O happen in the background.

Also, please add the slides describing the use case for these routines to this ticket for those of us not present at the forum meeting...

yes working on it..

mpiforumbot commented 8 years ago

Originally by chaarawi on 2011-10-25 15:37:29 -0500


Attachment added: draft_nbfile_v2.pdf (445.8 KiB) Draft for formal reading

mpiforumbot commented 8 years ago

Originally by chaarawi on 2011-10-27 12:15:39 -0500


Attachment added: MPIForum_NBFM.pptx (50.4 KiB) ppt presented at october 11 meeting