ofiwg / libfabric

Open Fabric Interfaces
http://libfabric.org/
Other
571 stars 380 forks source link

libfabric-2.0: Tagged message enhancements #9020

Open shefty opened 1 year ago

shefty commented 1 year ago

Define new concept called 'communication flows'. A communication flow is identified by a comm key/id. Comm keys identify separate virtual flows over the same endpoint. They provide a semantic mapping for MPI communicators. From the perspective of libfabric, they define a virtual communication flow. Comm key values are user defined. Comm keys are used for receive side buffer matching. Providers would report a comm_key_cnt attribute to indicate support and the number of active flows that they can distinguish between. A comm key is specified as part of the address using a macro such as fi_addr_key(comm_key, av_index) = comm_key << 32 | av_index. This works assuming AV table support only. The comm_key at the sender and receiver must match, and it is the app's responsibility to sync the key values. The provider can carry the key using any mechanism available (e.g. embedding it into a tag). Comm_keys are expected to be used with both tagged and untagged messages. E.g. MPI implementing a collective call using pt2pt semantics.

Define new concept called 'message flow'. Message flows provide a semantic mapping to convey higher-level operations, such as MPI barrier vs allreduce vs ... Message flows are conceptually similar to comm keys. The distinction is that message flow values are expected to consume a much smaller range of values and may be hard-coded by the application. Message flow id's are also used with receive side buffer matching, and are usable with tagged and untagged messages. Specifying a message flow requires new fields in struct fi_msg and fi_tagged. Alternatively, for tagged messages a macro similar to fi_addr_key, fi_tag_flow(msg_flow, tag) can be used to embed the msg_flow id into the tag prior to giving it to the provider.

The benefit of the above changes gives the provider insight into the desired semantics requested by the application, enabling more efficient algorithms for receive side buffer matching. It may be possible to extend either of the above with additional attributes and ordering semantics. For example, ordering between communication and/or message flows may or may not be needed. Priority can be given to transfers of one flow over another. When both are enabled, the ignore bits in the tagged API may not be needed, and the supported range of tagged values can be reduced to 32-bits for the purpose of MPI.

a-szegel commented 1 year ago

https://github.com/ofiwg/libfabric/issues/9022

shefty commented 1 year ago

Note: The existing API behavior and use of the tagged APIs is still supported as-is for 2.0. The above features would be through some sort of opt-in extension. The exception may be updates for FI_DIRECTED_RECV (i.e. always enabled for tagged, not defined for untagged).