pmix / pmix-standard

PMIx Standard Document
https://pmix.org
Other
22 stars 22 forks source link

when is `PMIx_Fence` required? #511

Open thomasgillis opened 1 month ago

thomasgillis commented 1 month ago

Hi all, I am reaching out with a question about the requirements around PMIx_Fence.

Context

For our application, we are looking at the case where processes generate globally unique keys. They are posted to other processes using PMIx_Put. On the other side, the processes requesting the value associated with the globally unique key have no knowledge of where the key is located, so they rely on PMIX_RANK_UNDEF.

Clarification

For that specific case, the API documentation mentions a few things around the need of PMIx_Fence but the final answer is not clear to me. Here are the list of sections I have identified to relate to this use case and some associated questions:

It's clear that the PMIx_Get will block until the key is available (or timeout). But it's unclear to me how can I guarantee that the data appears on the server?

Here, I presume that "must be globally exchanged prior to retrieval" refers to the "global" method?

Global, collective exchange of the information prior to retrieval. This is accomplished by executing a barrier operation that includes collection and exchange of the data provided by each process such that each process has access to the full set of data from all participants once the operation has completed. PMIx provides the PMIx_Fence function (or its non-blocking equivalent) for this purpose, accompanied by the PMIX_COLLECT_DATA qualifier.

If so, does it mean that to have the data on the server, we are semantically required to call PMIx_Fence in our case?

Thanks for your help in clarifying the semantics :-)

rhc54 commented 1 month ago

It's clear that the PMIx_Get will block until the key is available (or timeout). But it's unclear to me how can I guarantee that the data appears on the server?

I'm afraid you can't, which is why we recommend that you either (a) use a fence to ensure it gets there, or (b) include a PMIX_TIMEOUT so you can gracefully fail if it never arrives.

Here, I presume that "must be globally exchanged prior to retrieval" refers to the "global" method?

Not sure I understand your last use of "global", but the text is advising that you utilize a fence.

If so, does it mean that to have the data on the server, we are semantically required to call PMIx_Fence in our case?

Not required, but advised.

Let me try to provide a little more of an explanation. There are two ways you can retrieve data that was "put":

(a) you can use a fence operation to circulate the data. This is the most common method and generally supported by all runtimes

(b) you can avoid the fence operation and rely on the "direct modex" (DM) method for retrieving the data. In DM, the data is left on the local server where your app "put" it. When another process requests the data, the local server must determine the identity of the server that has the data (i.e., the server that is hosting the client process that "put" the data) and then request the data from that server. Since there is no sync'ing fence, that server must "hold" the request until the data has been committed to it before responding (or else you'd just get a "not found" right away).

DM may not be implemented by all runtimes. It is known available in PRRTE and OMPI's runtime, and on Slurm for srun. So it may not work in your environment. If it isn't available, you should get a returned error on PMIx_Get without a prior fence. The error code may vary - could be "not supported" or "not found".

Being a point-to-point method, DM can scale poorly in certain applications - e.g., if every process needs the info from every other process. It tends to work well for sparsely connected applications where each proc only needs info from a small number of its peers, and also in apps that can largely execute asynchronously.

HTH

thomasgillis commented 1 month ago

@rhc54 thanks for the detailed answer :-)

Here, I presume that "must be globally exchanged prior to retrieval" refers to the "global" method?

The last "global" refers to the global mode (as opposed to DM). My question here was to know if in this case the usage of a Fence is mandatory or is it only recommended?

Assuming that DM is supported, I still have a question about that mode: what triggers the local server (that is holding the ongoing request) to get the information on which server has the right key? From you answer I presume a fence does that, but is there any other function that would do the same, like PMIx_Commit with a specific mode to ensure that the committing process broadcasts that key to all the other servers?

rhc54 commented 1 month ago

The last "global" refers to the global mode (as opposed to DM). My question here was to know if in this case the usage of a Fence is mandatory or is it only recommended?

Best answer is: it depends on the runtime you are operating under. If it doesn't support DM, then the fence is mandatory. If it does support DM, then it really is up to you to decide based on the needs and behavior of your app.

What triggers the local server (that is holding the ongoing request) to get the information on which server has the right key? From you answer I presume a fence does that, but is there any other function that would do the same, like PMIx_Commit with a specific mode to ensure that the committing process broadcasts that key to all the other servers?

Yeah, I forgot for the moment when writing the DM description that you are using generic keys. DM won't work with those as there is no way to know who is going to publish them. So if you use generic keys, you are kinda forced to do a fence to share them.

Sorry about the confusion - we don't usually encounter generic keys any more. I suppose someone could try to implement a broadcast-based DM method, but I don't know of anyone doing so - seems like it would be awfully inefficient, but I haven't given it a lot of thought.

If your job isn't too large, you could do a PMIx_Publish and PMIx_Lookup instead. Doesn't scale very well as it involves looking up values in some central location, but it would avoid doing a fence and might not be that much different from DM. Biggest difference is that DM returns all values "put" by the source proc under the assumption that if you ask for one, you are probably going to want something else from that proc. So it can provide a little optimization that isn't available with the "publish/lookup" method should there be multiple values involved.

Of course, you have to be in a runtime that supports those functions - PRRTE and OMPI's mpirun do, but I don't know about others (might, just don't know about it).