mpiwg-rma / rma-issues

Repository to discuss internal RMA working group issues
1 stars 0 forks source link

Different values of alloc_shared_noncontig across process group? #12

Closed devreal closed 2 years ago

devreal commented 4 years ago

Dan Holmes (perhaps unintentionally) raised the point that the standard does not explicitly require the value of the info key alloc_shared_noncontig to be the same on all processes participating in the allocation of a shared memory window [1].

In the description of MPI_WIN_CREATE the text explicitly states

The various processes in the group of comm may specify completely different target
windows, in location, size, displacement units, and info arguments.

I don't see text that restricts alloc_shared_noncontig to be the same on each process. Is that intentional? I fail to imagine the semantic of different alloc_shared_noncontig values at different processes in the group creating the window. After all, the info key controls the layout of memory shared across processes and is not just a local optimization.

Thus my question: Should the text be amended (potentially as an errata) to clarify that alloc_shared_noncontig should have the same value at each process?

[1] https://github.com/mpi-forum/mpi-standard/commit/92ae64bfd1943ac5fa3659af76c815ff21ce1fdc#r34773896

jeffhammond commented 4 years ago

Good find. We should correct this, both by explicitly stating that noncontig must be uniform across processes and by removing info arguments from the list of things that can vary arbitrarily. Unless there is a good reason, we should probably require homogeneous info arguments. The justification for allowing them to vary is pretty weak.

jdinan commented 4 years ago

Isn't it possible that an implementation could map the same shared segment contiguously at one process and noncontiguously at another? It's not hard to imagine weird scenarios like this being useful when combining CPU and GPU processes.

jeffhammond commented 4 years ago

Tell me what it means to mix contiguous and noncontiguous. Is the allocation contiguous or not? Where is pointer arithmetic across process boundaries valid and where is it undefined behavior in C?

jdinan commented 4 years ago

One process could map the pages into a contiguous virtual address range (i.e. contiguous sequence of pages) while another does not.

jeffhammond commented 4 years ago

But from a C perspective, is it or is not from the same allocation such that pointer arithmetic is legal?

jdinan commented 4 years ago

I don't understand your question. IIUC, we are talking about different processes mapping the same set of physical pages into their address spaces using different mappings.

jeffhammond commented 4 years ago

I think what you’re proposing needs to be a separate ticket with its own motivation. The semantics you suggest don’t make any sense to me. Contiguous means you can do pointer arithmetic across process boundaries. Noncontiguous means you can’t. The latter might be correlated with physically disjoint pages for NUMA but it doesn’t have to be.

jdinan commented 4 years ago

One process says noncontig the others don't. MPI uses this info to map the pages differently in the corresponding processes. Yes, this implies different pointer usage across processes. Unless I'm mistaken, it's not a new proposal, it is allowed under the current text. This ticket suggests removing the possibility of the above usage model (which is optional for implementations since its requested via info). What is the case for removal?

On Wed, Aug 21, 2019, 9:50 PM Jeff Hammond notifications@github.com wrote:

I think what you’re proposing needs to be a separate ticket with its own motivation. The semantics you suggest don’t make any sense to me. Contiguous means you can do pointer arithmetic across process boundaries. Noncontiguous means you can’t. The latter might be correlated with physically disjoint pages for NUMA but it doesn’t have to be.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mpiwg-rma/rma-issues/issues/12?email_source=notifications&email_token=AAZ5P4RY6FFJTOD5KRAMI7DQFXWHVA5CNFSM4IOGNJ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD43TYVI#issuecomment-523713621, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZ5P4UYOFM7FOJBOMVBMY3QFXWHVANCNFSM4IOGNJ7A .

jeffhammond commented 4 years ago

What info input causes the implementation to allocate physically noncontiguous memory eg to optimize for NUMA?

devreal commented 4 years ago

As long as one process in the group does not request noncontiguous memory, the memory has to be allocated physically contiguous so any placement optimizations are off limits. The other processes could, theoretically, map the memory noncontiguously into their virtual address space but what is the benefit of that? It doesn't change the alignment of the physical memory chunks (to avoid NUMA issues and get alignment right, for example) and disallows pointer arithmetic to access the memory of other processes, combining the disadvantages of both worlds.

I am also in favor of disallowing different values for the other info keys. This could enable some optimizations in the allocation process, e.g., reducing collective communication. Since it has been explicitly allowed so far, I am not sure we can fix that without breaking backwards compatibility though.

jdinan commented 4 years ago

Should have included that I am also assuming the user is careful to make each chunk size a multiple of the page size to allow differences in mapping. FWIW, backward compatibility may not be much of an issue since this is info.

We have had many of these discussions about uniform info for windows and communicators and have erred on the side of allowing nonuniform values. This allows divergent behavior of processes and may enable optimizations we did not foresee. Given that processes already synchronize when creating these objects, the cost of a reduction to check info values has been seen as acceptable.

jeffhammond commented 4 years ago

I agree with @devreal's assessment of why heterogeneous inputs is not particularly useful. The only thing that matters for performance is the physical contiguity of the allocation.

I agree with @jdinan that the check here isn't a bottleneck. Shared memory windows have limited reach on most systems and the boolean nature of this info means we can use a reduction rather than a gather.

devreal commented 4 years ago

Proposal from the MPI forum meeting on this topic: introduce a key along the lines of mpi_assert_same_info to have the users assert that all keys are equal on all participating processes. This retains the freedom for weird edge cases while giving users and implementations a path to reducing the overhead if homogeneous info key values are guaranteed.

jdinan commented 4 years ago

@devreal For your reference: https://github.com/mpi-forum/mpi-issues/issues/76. This was proposed earlier and the Forum didn't go for it. There are several other ideas on the ticket, none of which were particularly appealing. Best of luck to you. :neckbeard:

devreal commented 4 years ago

@jdinan Thanks for pointing this out to me, I wasn't aware of your previous efforts. By now it's probably not relevant for 4.0 anyway but it might be worth bringing this up again at some later point.

jdinan commented 4 years ago

@devreal Apologies if my last message was not terribly encouraging. I do still think this is a good idea. Sometimes it takes the Forum a little while to warm up to a proposal. I'd certainly give this another try. Especially with info keys, these tend to get picked up quickly by implementations, so I wouldn't let the 4.0 cutoff stop you.

devreal commented 2 years ago

Duplicate of https://github.com/mpi-forum/mpi-issues/issues/76