openshmem-org / specification

OpenSHMEM Application Programming Interface
http://www.openshmem.org
49 stars 32 forks source link

Proposal for sessions/bundles with a start/stop API and simple example #481

Closed davidozog closed 1 year ago

davidozog commented 2 years ago

This PR addresses issue #189.

As always I appreciate feedback, thanks!

naveen-rn commented 2 years ago

One-more item from #189 are not yet addressed:

  1. Defining the relationship between memory ordering routines and bundle? What happens when a memory ordering (explicit or implicit from collectives) is called within a bundle
davidozog commented 2 years ago

Thanks very much for the review @naveen-rn.

  1. Defining the relationship between memory ordering routines and bundle? What happens when a memory ordering (explicit or implicit from collectives) is called within a bundle

I tried to capture that with:

[bundling routines] do not affect the completion or ordering semantics of any OpenSHMEM routines in the program.

For example, if a quiet occurs within a bundle, then the runtime must ensure remote completions on the calling PE as normal. Also:

routines, such as ... the memory ordering routines might require the library to enforce remote completion, reducing the potential benefit of bundling optimizations.

So while it's not strictly prohibited in this draft, it is discouraged to include memory ordering within a chain, because it may require the implementation to "break" the chaining optimization.

Does it makes sense? Do you think memory ordering be prohibited within a chain?

nspark commented 2 years ago

I agree with @jdinan's comment that this API may be a better fit for the "Communication Management Routines."

In that case, maybe the naming should be slightly revised; e.g., shmem_ctx_bundle_startshmem_ctx_start_bundle. But, this leaves shmem_bundle_start as something of an oddball.

nspark commented 2 years ago

@naveen-rn suggested making bundling a property that could be enabled/disabled dynamically (i.e., not only at context creation), which would generalize the API for potential future dynamic properties on contexts.

nspark commented 2 years ago

@davidozog One case of interest to me is combining bundling with put-with-signal. I would like to think there's a good bit of room for optimization internal to an OpenSHMEM library.

For example, we often think of put-with-signal as roughly equivalent to:

shmem_putmem(..., pe);
shmem_fence();
shmem_set(&sigword, signal, pe);

(Note, we lack, and need, shmem_signal_set; see #382.)

In my mind, bundling + put-with-signal could effectively transform:

for (int i = 0; i < bound; i++)
  shmem_put_signal_nbi(..., &sigword, SHMEM_SIGNAL_SET, pe_targets[i]);

Into:

for (int i = 0; i < bound; i++)
  shmem_put_nbi(..., pe_targets[i]);
shmem_fence();
for (int i = 0; i < bound; i++)
  shmem_signal_set(&sigword, signal, pe_targets[i]);

Not every library would necessarily want to make such a transformation. But, I imagine some might find it advantageous.