openshmem-org / specification

OpenSHMEM Application Programming Interface
http://www.openshmem.org
51 stars 40 forks source link

Team creation and resource utilization #492

Open nspark opened 2 years ago

nspark commented 2 years ago

The current (1.5) shmem_team_config_t structure only specifies a num_contexts field. However, there is some concern about the costs of team creation and associated communication resources. This issue will attempt to capture use-cases for expanding the team configuration structure or context creation options.

nspark commented 2 years ago

Team-based contexts are "only" used for team-relative operations

In OpenSHMEM, the only way to get team-relative RMA or AMOs is to create a context from the desired team. In these cases, the context itself could share the underlying resources of SHMEM_CTX_DEFAULT, and the library could perform the team-relative PE indexing. This is not the case where num_contexts is zero; a context will need to be created. But, the context only serves to provide team-relative PE indexing for RMA and AMO operations; it is not used as a communication stream that is independent from SHMEM_CTX_DEFAULT (or some other context).

naveen-rn commented 2 years ago

This is not the case where num_contexts is zero; a context will need to be created.

Can you please clarify this statement? This is confusing.

naveen-rn commented 2 years ago

In general, implementations could follow one of the design options:

  1. Static allocation - In static allocation, all context related resources are created upfront probably during shmem_init, Irrespective of the TEAMs that these resources will be associated with. This is sort of creating n * SHMEM_CTX_DEFAULT contexts. Whenever new context objects are created, one of the pre-created resources gets mapped to it. Teams do not matter in this case. We can add Teams information in the SW and let the SW do the PE translation when required during data transfer operations.

  2. Dynamic allocation - context resources are created during the shmem_ctx_create operations. Since, during the shmem_ctx_create or shmem_team_ctx_create operation - we know the Team associated with the new context object being created - implementations can make some effective resource allocation decision.

The info provided during the team-creation operations are just hints - which can be used to reduce the cost of the resource allocation during the dynamic context creation operation. I don't see a huge benefit with this hint for static allocation design. AFAIU, even if the user sets num_contexts as zero during team creation and try to create a new context on the team later - the implementation needs to support it.

nspark commented 2 years ago

This case still needs to create a context to get the team-relative PE indexing for point-to-point operations. Thus, it's not appropriate to set config.num_contexts = 0 when creating the team.