Closed a-szegel closed 8 months ago
@a-szegel Is the inject size mismatch the primary reason behind wanting to toggle the offload on and off? Whether a provider uses a peer provider internally is a decision within that provider and doesn't make sense to me to expose it as a capability.
Agree. If a provider decides to expose peer provider usage as an option to the user, the best way is to use different provider names.
Inject size mismatch is the reason I want to change it at fi_info time. Otherwise, it would be ok setting it at fi_domain creation time... but I don't want to use an env variable, so I need some way of programmatically passing what I want into the provider (back to fi_info time).
I understand the inject_size mismatch can be solved by making peer's inject size configurable too.
Until it is configurable, would it be possible for efa to do a dummy call into fi_getinfo(shm) during efa getinfo to query its inject size in order to properly select it if the inject size works? Otherwise, I think it's ok for efa to hard code that internally
Migrate some offline conversation here
Sean Hefty The use of peer providers should be hidden from the app. But if provider control is desired, the way to indicate that would be through the prov_name attribute, which would allow greater flexibility in how providers are selected.
Shi Jin But if application doesn't know the existence of peer providers, how could they know what prov_name to add beyond the owner provider they are using? (edited)
Sean Hefty I said "should be hidden". Setting FI_NO_PEER_PROVIDER indicates that the app knows of the peer provider architecture, but also has insight into which providers pair with which others. The app then somehow decides to disable peers for some reason. I don't know how that gets done without using some environment variable. If the app somehow already has implicit knowledge of how providers are constructed and whether the peer APIs are being used for that, versus a provider like EFA simply building the shared memory support in directly (such that it's not a peer), then prov_name would allow for more explicit provider composition. For example, use exactly these 3 providers as peers, or layer provider X over Y and use Z as a peer with X.
@j-xiong
Agree. If a provider decides to expose peer provider usage as an option to the user, the best way is to use different provider names.
That's the tricky part, we never expose peer provider usage to application and application only use efa
as provider name. But some application assumed that by making such configuration, all the transmission (including local comm) are through the NIC, which is not valid if efa uses shm as a peer provider implicitly.
I am even not sure if such assumption is legal, but I am seeking ways that we can help them in this edge case without setting env
@shijin-aws I think the suggestion in regards to using provider names is to have behavior similar to what we do with FI_HMEM or rxm where you have multiple fi_info entries for efa with and without shm where the provider names are different ie efa and efa;shm. The way for an efa application to explicitly avoid shm would be to select the efa only fi_info but you could order them efa;shm, efa so that by default you would select the shm offload (if you default for shm is on). Would that target the case you're consider with?
@aingerson The suggestion is feasible, but rxm
is not a core provider, so the case is not the same: both efa and shm are core providers, shm can be an offload for intra-node comm in some situation... I think the question is actually: if an application only has efa
in prov_name, is it required that only efa provider is being used? I think the answer is yes and currently efa provider is not obeying such requirement.
@shijin-aws I think if the application just set efa you could still return both options. The proposal in this issue is to add an option for the user to select efa with or without shm. Returning two different fi_infos for with or without shm is still allowing them to select efa with or without efa. The difference is that instead of adding a capability passed into the fi_getinfo call (as suggested), you would return both options and allow the application to select which one it needs. The old case would still work as before (user requests efa and expects efa+shm which is the first fi_info). The new case to disable shm would require application modification either way (it would need to pass in the capability bit the other suggested method) and would skip the efa;shm info and use the efa only info. The other option, though feels hacky in my opinion, would be to have two fi_infos: efa (which uses shm) and efa^shm (with shm disabled) and then the application could request prov_name="efa^shm" specifically which would not return the regular efa+shm. Whichever solution you choose you're going to need something on the application level to trigger the selection (whether you set hints->caps, set the provider name to "efa^shm", or filter through the list to skip the efa+shm option). But I think it needs to be something efa-specific since the internal shm usage is an efa-specific option.
We plan on adding a new enum FI_OPT_SHARED_MEMORY_PERMITTED
to the endpoint setopt enum FI_OPT_FI_HMEM_P2P
. We also plan on moving all of SHM initialization inside the efa provider to EP creation.
Addressed by #9750.
Inject Size is set at fi_info time. The EFA Provider has 8k inject size, and the SHM provider has a 4k inject size (plan on making this configurable in the future). I want a way to know at fi_info time if the user wants the peer provider to be used, and I don't want to use an environmental variable https://github.com/ofiwg/libfabric/issues/9450.
Proposal I think a Primary Capability Modifier best fits what we need. I propose adding a new primary modifier such as
FI_NO_PEER_PROVIDER
.Capabilities are defined by the libfabric API here:
Describe alternatives you've considered