Open wgropp opened 7 months ago
Are we talking about all requests or only persistent requests?
I would agree that Page 80, lines 42-43 in combination with Page 515, lines 20-22 should work for persistent requests but should not enable concurrent testing of non-persistent requests. The moment one thread completes a non-persistent request the other thread holds a stale request handle which should not be valid to test. It would require implementations to remember all request handles ever created to avoid invalid memory accesses.
That's a good point about persistent and non-persistent. The non-persistent case is the one that I was thinking about when I raised the INOUT nature of that argument - and while handling that isn't trivial, there are ways to implement this that are thread safe and don't require remembering all requests.
We should distinguish two orthogonal levels of thread safety involved in multi threaded MPI execution. One level is the general internal state of the MPI implementation, including message queues and other data structures. There we have, for example , the notion of random message ordering for concurrent receives.
The other level is explicit concurrent access to specific MPI resources represented by MPI handles. How is a race on a concurrently used receive buffer different from a race on concurrently passed opaque objects with write semantics?
I agree with @devreal that stale handles from different threads (if the pass a reference to their local copy of the handle) will be a real issue.
To be clear, I can implement MPI_TEST to be thread safe, even when called with the same (pointer to) request, but at the cost of some critical sections and thread-safe flags. We can talk about weaker forms of thread safety, but that would be a change.
I don't question that the case of passing pointers to the same handle can be implemented thread-safe. But
The MPI standard has no restriction that handles may not be copied. Concurrently passing a handle to MPI calls does not require to pass pointers to the same handle copy. If you argue that passing the same pointer should be allowed, I will argue that passing pointers to different instances would also be allowed with pre 4.1 wording. And that's the case where @devreal makes the case for a hard to implement feature. It would basically require the implementation to not release handles until it can prove that no thread might eventually access the handle.
I'm not disagreeing that it's non-trivial to implement - that was the reason for my original comment on the INOUT parameter (which is one of several things that has to be handled carefully). The point is that the MPI standard, before 4.1, could be read as requiring that this be implemented for MPI_TEST (and other nonblocking completion routine), and 4.1 changed that. Passing the same pointer (which does have a well-defined and thread-safe behavior) was permitted and now is not, and is inconsistent with the general thread safety text.
The outcome of two concurrent calls to MPI_TEST with the same handle can be that either both fail or one succeeds and one fails. These semantics are well-defined. Admittedly, this requires that the request handle be accessed with atomic loads and stores—which is the case anyhow, in practice.
These semantics are useful for implementing a multithreaded server. Multiple threads poll the same request and post a new request when the test succeeds. Current semantics will require the use of an additional lock -- the MPI library could use atomic operations instead.
Whatever the forum's preferred solution, the current text is misleading. The text in section 11.6.3 is misleading: It starts with "A program in which two threads block, waiting on the same request, is erroneous" and then proceeds to possibly prohibit concurrent nonblocking tests. The text claiming that MPI calls are thread-safe needs to be elaborated and clarified: Most users would assume that two "thread-safe" functions can be called concurrently.
I think it would be extremely educational for future readers of this discussion to have a pointer to such an algorithm, one that would not require tracking "possibly still referenced" requests, be it non-trivial. One that encompass some quantifiers for efficiency would be even better.
A request being a reusable object, multiple threads using the same pointer would not know if they access the "same" request (aka. the request for the same communication) or two different communications reusing the same request pointer. We would need to add a mean to ensure uniqueness of these pointers, an overhead that is hardly justifiable by any gains on the usability from a user perspective.
- The outcome of two concurrent calls to MPI_TEST with the same handle can be that either both fail or one succeeds and one fails. These semantics are well-defined. Admittedly, this requires that the request handle be accessed with atomic loads and stores—which is the case anyhow, in practice.
Please explain how an atomic write to one memory location has an impact to an atomic read from another memory location. The semantics of MPI test is defined at MPI handle level, not based on addresses of MPI handles.
Multiple threads completing the same request. A program in which two threads block, waiting on the same request, is erroneous. Similarly, the same request cannot appear in the array of requests of two concurrent MPI_{WAIT|TEST}{ANY|SOME|ALL} calls. In MPI, a request can only be completed once. Any combination of wait or test that violates this rule is erroneous.
This seems to have been introduced in MPI 2.0 so it has been the law of the land for 25 years. And the "all MPI calls are thread-safe" statement does not render the above constraint invalid (you can still call MPI_Test
concurrently, just not on the same request).
Another way of looking at it is whether implementations have supported this (and thus users have been able to get away with breaking the above rule). For Open MPI, it's safe to say that this was never safe because request handles are pointers to objects that are either freed or reused, so the behavior of two threads testing the same request has always been undefined.
Assuming that we could safely implement this for nonblocking requests: is it worth the cost? Every application today that calls MPI_Test
will pay that cost for a small fraction of users that may want to concurrently test the same request.
I am not entirely opposed to allowing this on persistent requests, with the caveat that the outcome of test and wait is potentially undefined if a thread concurrently completes the request (think MPI_Testany
returning MPI_UNDEFINED
instead of a valid index) and may lead to deadlocks (if the completing thread restarts the operation). There is some careful wording needed but at least it could be safe. I doubt that this can be made safe for for nonblocking requests within acceptable bounds.
For Open MPI, it's safe to say that this was never safe because request handles are pointers to objects that are either freed or reused, so the behavior of two threads testing the same request has always been undefined.
The question is whether in thread-multiple mode, openmpi implements completion of the requests and setting the handle to the null-handle as an atomic operation. In that case, the implementation code on the other thread would only see the null-handle, if the function argument points to the same handle instance. In such case, the application might get away with such code pattern.
@jprotze how the MPI implementation does the completion is irrelevant if the user make copies of the MPI_Request
handle.
It is unclear to me what exactly are we trying to address here. Are we trying to allow this code
extern MPI_Request req; /* assume pointing to a nonblocking communication */
#pragma omp parallel
{
MPI_Test(&req, MPI_STATUS_IGNORE);
}
or this code ?
extern MPI_Request req; /* assume pointing to a nonblocking communication */
#pragma omp parallel
{
MPI_Request my_req = req;
MPI_Test(&my_req, MPI_STATUS_IGNORE);
}
What about the following code ? Should it be allowed or not ?
extern MPI_Request req; /* assume pointing to a nonblocking communication */
/* single thread */
MPI_Request req2 = req;
MPI_Test(&req, MPI_STATUS_IGNORE);
MPI_Test(&req2, MPI_STATUS_IGNORE);
Slightly more complex, to give the MPI implementation the opportunity to reuse the request. What should the outcome of this be ?
extern MPI_Request req; /* assume initialized */
/* single thread */
MPI_Request req2 = req;
MPI_Test(&req, MPI_STATUS_IGNORE);
MPI_Irecv(..., &req);
MPI_Test(&req2, MPI_STATUS_IGNORE); /* what request are we completing here ? */
From my perspective, all these are incorrect usages of MPI, trying to complete multiple times the same request.
Answer to George B: f two threads call MPI_TEST simultaneously, they are not required to test the same MPI object. Indeed, there is no point in having two threads poll with the same handle if only one operation can be completed with this handle. The only requirement is that the outcome is what some serial execution of the concurrent calls would achieve. So, if an isend is posted and next there are three concurrent calls with the same handle, a test, a test, and an isend, then one test could match the first send, and the second test could match the second send. Or only one test could succeed, or none. This is trivially achieved if a lock protects MPI calls that access a handle. However, a lock-free protocol can be used with atomic operations. E.g., one can use atomic exchanges to update the handle.
Whatever definition of "thread-safe" the forum wishes to use, the standard text needs to be clarified.
@snirmarc the MPI standard already allows concurrent calls to MPI_Test
and/or MPI_Wait
for as long as they apply on different MPI requests.
Irrespective of the protection mechanism, locks or atomic operations, MPI requests might be released at their completion, so they cannot be protected except if they are forever kept around and never reused by the MPI implementation.
The MPI standard seems already clear enough when it bluntly states that
... a request can only be completed once. Any combination of wait or test that violates this rule is erroneous.
This statement is however far from perfect. I can see two trivial issues:
MPI_TEST*
and MPI_WAIT*
) and non-destructive calls, aka.MPI_REQUEST_GET_STATUS*
. As MPI_REQUEST_GET_STATUS*
does not really complete the request, the statement does not apply.MPI_TEST
should be allowed.So, if an isend is posted and next there are three concurrent calls with the same handle, a test, a test, and an isend, then one test could match the first send, and the second test could match the second send.
This effectively breaks the use of statuses in nonblocking operations. Consider a fourth thread that posts an irecv and wants to inspect the status upon completion (because it used MPI_ANY_SOURCE
for example). SInce we're reusing request handles the irecv may be assigned a request handle that came from the first isend after one of the concurrent test calls completed it. Now the second test from your example will race against the fourth thread waiting for the irecv to complete because they both use the same request object. The fourth thread may get the status it is looking for or the empty status because its operation was completed by another thread.
More generally, we're losing the connection between operations and requests. That's bad. And there is no way for the implementation to tell what is a legitimate reuse and what is not. I'd much rather impose the existing constraint than make MPI unrealiable.
".. a request can only be completed once. Any combination of wait or test that violates this rule is erroneous." This statement is meaningless: Requests are not completed by the user but by the MPI library.
I agree that MPI should complete a request only once. I also agree that bugs can occur if two threads use distinct handles to access an MPI object. This is true no matter what MPI specifies, even if no concurrent communication calls exist. Once you have pointers in the language, you can have dangling pointers. If threads use the same handle and the implementation ensures that a handle update and object update appear to be one atomic operation, then there should be no problem.
Again, my main point is that the standard is misleading.
I copy a definition of thread-safe from the web: "Threadsafe: yes This classification indicates that you can safely call the API simultaneously in multiple threads without restrictions. This classification also indicates that all functions called by this API are threadsafe."
There may be good reasons for MPI not to be thread-safe or for the standard to define "thread-safe" differently. The standard should clarify what it means. George seems inclined to forbid two MPI calls to be concurrent if they explicitly access the same MPI object, and one may update it. Or perhaps forbid it if one of the calls may update the handle. (These are two different choices.) Say it.
@snirmarc
- The outcome of two concurrent calls to MPI_TEST with the same handle can be that either both fail or one succeeds and one fails.
The list is not complete: A possible outcome is that both succeed, if the first Test completes the communication and the second Test "sees" MPI_REQUEST_NULL. The test with MPI_REQUEST_NULL will also result in flag=true.
- These semantics are useful for implementing a multithreaded server. Multiple threads poll the same request and post a new request when the test succeeds. Current semantics will require the use of an additional lock -- the MPI library could use atomic operations instead.
In the case of two successful tests, which thread should create the new request? For send requests you will not even be able to detect which of the two tests actually completed the request and which one succeeds just for MPI_REQUEST_NULL.
I copy a definition of thread-safe from the web: "Threadsafe: yes This classification indicates that you can safely call the API simultaneously in multiple threads without restrictions. This classification also indicates that all functions called by this API are threadsafe."
This definition makes a lot of assumptions about the kind of API. Threadsafe does not necessarily mean that the API protects the arguments for data race. Especially if the API writes to buffers provided to the function, even with protection for data race, the result might be meaningless due to the remaining race condition.
@devreal
Multiple threads completing the same request. A program in which two threads block, waiting on the same request, is erroneous. Similarly, the same request cannot appear in the array of requests of two concurrent MPI_{WAIT|TEST}{ANY|SOME|ALL} calls. In MPI, a request can only be completed once. Any combination of wait or test that violates this rule is erroneous.
This seems to have been introduced in MPI 2.0 so it has been the law of the land for 25 years. And the "all MPI calls are thread-safe" statement does not render the above constraint invalid (you can still call
MPI_Test
concurrently, just not on the same request).
Your quote does not restrict concurrent use of the same request with MPI_Test.
How much more clear than "In MPI, a request can only be completed once. Any combination of wait or test that violates this rule is erroneous." should the standard be ?
Starting with a valid thread-serialized program:
#include <mpi.h>
#include <omp.h>
#include <stdio.h>
int main() {
int provided, rbuf, sbuf = 23, rank, size;
MPI_Init_thread(NULL, NULL, MPI_THREAD_SERIALIZED, &provided);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Request req;
MPI_Irecv(&rbuf, 1, MPI_INT, size - rank - 1, 23, MPI_COMM_WORLD, &req);
#pragma omp parallel
{
#pragma omp master
MPI_Send(&sbuf, 1, MPI_INT, size - rank - 1, 23, MPI_COMM_WORLD);
int flag = 0;
while (!flag) {
#pragma omp critical
MPI_Test(&req, &flag, MPI_STATUS_IGNORE);
printf("Thread %i test flag=%i\n", omp_get_thread_num(), flag);
}
}
MPI_Finalize();
}
With the text cited above, the code should still be valid, and successfully terminate, when I set MPI_THREAD_MULTIPLE
and remove the critical
:
Page 515, lines 20-22:
All MPI calls are thread-safe, i.e., two concurrently running threads may make MPI calls and the outcome will be as if the calls executed in some order, even if their execution is interleaved.
The will be as if the calls executed in some order describes the observed execution behavior as if there still was the critical. The request is only completed once and other concurrent tests will only see MPI_REQUEST_NULL
.
A quick test with OpenMPI and IntelMPI showed that both implementations allow the application to successfully terminate with the expected number (procs * threads) of printed flag=1
lines.
As valid as any concurrent, potentially destructive, access to a shared object.
Once you remove the critical section around the manipulation of req
, all threads will execute the statement MPI_Test(&req)
on a shared variable. For the sake of the argument, let's strip down MPI_Test
to a bare, basically calling free on the object pointed by your req
(which is what MPI_Test would do if it returns with the flag set). Would you still maintain that the code is valid ?
A thread-safe implementation of MPI_Test with your requirement would look like:
typedef void* MPI_Request;
#define MPI_REQUEST_NULL NULL
int MPI_Test(MPI_Request* req, int *flag, MPI_Status status){
#pragma omp critical (MPI_Request)
{
if(*req != MPI_REQUEST_NULL)
free(*req);
*req = MPI_REQUEST_NULL;
}
*flag=1;
}
I simply used critical for the purpose of highlighting the atomicity requirement imposed by the MPI standard.
It is not thread-safe if the user made a copy of the request. Second, this is a performance nightmare, one that nobody should propose in 2024.
Let's revisit the request completion sentence, and while do we that widen its scope.
Most MPI calls are thread-safe. However, a program in which threads are calling any combination of MPI procedures, where at least one of them is a freeing procedure, is erroneous. This apply to all MPI objects, including requests where multiple threads completing the same request is illegal, and the same request cannot appear in the array of requests of two concurrent MPI_{WAIT|TEST}{ANY|SOME|ALL} calls.
Problem
MPI 4.1 has text on the same request being completed by multiple threads. This text includes MPI_TEST, which is inconsistent with the original description of "thread safety" for MPI routines. See below for details.
Proposal
Marc Snir sent me the following, which I've included here with his permission:
Version 4.1 section 11.6.3 line 38
The first sentence speaks of two threads blocking on the same request. The second one speaks of MPI_TEST. This seems to imply that two threads cannot call concurrently MPI_TEST on the same request, i.e., that MPI_TEST is not thread-safe. The rationale does not explain why concurrent calls to MPI_TEST must be restricted.
Also, the clarification does not clarify whether an MPI_WAIT and an MPI_TEST on the same request can occur concurrently.
There are three cases:
IMHO, a. You may want to prohibit 3 but could instead say that one call will succeed, and the other will return an error (as if the request pointer was NULL). b. You should not prohibit 1. c. You could prohibit 2, but could have an implementation where the MPI_TEST always fail if there is a thread blocked on the same request.
(End of email from Marc)
Note that Marc's case 1 was permitted before the text from the standard that Marc mentions. Specifically, the standard includes these statements:
Page 80, lines 42-43:
Page 515, lines 20-22:
Based on these, Marc and I argue that calling MPI_TEST on the same request is two different threads is permitted, correct, and has well-defined behavior. The only potential issue is the combination of "even if their execution is interleaved" and the INOUT request argument.
Changes to the Text
TBD. Options include restrictions on the thread safety of routines with INOUT arguments or restriction of the text on completion calls on the same request to blocking routines. See Marc's three cases and suggestions.
Impact on Implementations
Depends on choice. May require no changes or may require some changes to ensure thread safety.
Impact on Users
Depends on choice. Some change is required in standard for consistency. Impact will mostly be clarity of standard for the users working with THREAD_MULTIPLE
References and Pull Requests