OpenSHMEM Queues for Aggregation

Problem :

Communication and data aggregation is known to provide better performance characteristics for the PGAS/OpenSHMEM applications [1][2]. However, to leverage aggregation, the OpenSHMEM programming model lacks abstractions that can be used by applications to express aggregation intentions, or that could be used by developers to optimize the OpenSHMEM implementations for aggregation.

[1] Jason Devinney's Conveyors keynote [2] Brad Chamberlain's Chapel keynote

Proposal :

Introduce OpenSHMEM queues as an abstraction to aggregate data and communication.

Details in the document pdf

(Caution: The document requires work to make it into a specification-complaint document.)

Impact on Users:

This provides an ability to aggregate communication and data.

Impact on implementation:

Implementations will have to implement the new interfaces described in the pdf.

@manjugv Queries on data queues or misc questions on the endgame/support for user-defined op-types (which are essential for the targeted use-cases) are slated for another comment. Let me try to understanding the basic communication queues.

Please clarify the following:

The differentiation between communication queues and contexts is very minimal. It almost seems both the SHMEM objects are performing the similar operation.
- Assume a EXCLUSIVE queue - I suppose it is used by a single thread? How does it differ from a PRIVATE context?
- AFAIU, there is no mandatory req that a queue is tied to a target process? So, there is no difference between context from the source side as well. It looks like irrespective of queues or contexts, there is sorting required on the source side to aggregate the message?
- In general, it would be beneficial to understand why sorting on a queue would be much effective than sorting on a context.
With respect to communication queues - the push operation semantics are not clear.
- Does the return from push operation guarantee immediate progress? As the queue_progress usage doesn't seem mandatory, it looks the return form push operation guarantees immediate progress?
- There is no guarantee that a new push operation is slated later - so once an op is pushed into the queue - the progress engine (either a host thread or a thread from the smartNIC) would immediately pickup the pushed event. If so, how do we chain the pushed operations?
What is the need for the query_size operation. Why does the user need to know about the pending operations in the queue?
Is the communication queue an object to setup the SHMEM users for future data aggregation operations with data queues ? If so, it makes sense. But I don't see a real benefit in introducing a new SHMEM object duplicating all existing operations to suite the new object.

openshmem-org / specification