ofiwg / libfabric

Open Fabric Interfaces
http://libfabric.org/
Other
573 stars 382 forks source link

API: expand collective attributes #8271

Open shefty opened 1 year ago

shefty commented 1 year ago

This is indirectly related to PR #8264.

There is currently no mechanism for an app to distinguish between how a provider may implement collectives. For example, are the collective calls implemented in software or offloaded to a switch? Additionally, there's no knowledge of what algorithm a collective implementation may use. Software could have a dozen options available. And although current hardware may only support one algorithm today, that may not always be the case.

The request is to expose more details on the collective algorithms or protocols that a provider may support. Paired with that would be the ability of an application to control which algorithm/protocol a specific collective call should use.

shefty commented 1 year ago

First step is to report whether the collective implementation is of interest to the app (offload vs sw). Future might be to report tunable values.

Are flags sufficient to report algorithm / protocol?

Need to consider link provider, which may combine multiple providers as 'one', but only one may have collective acceleration.