openucx / sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
https://www.sparkucx.org/
BSD 3-Clause "New" or "Revised" License
44 stars 31 forks source link

[CORE] UcxNode implementation - coordination point for ucp operations. #9

Closed petro-rudenko closed 4 years ago

petro-rudenko commented 4 years ago

UcxNode implementation, coordination point for Ucx operations:

  1. Creates and closes single UcpContext.
  2. Instantiates memory pool for context.
  3. On driver start listener to exchange RPC with executors.
  4. On Executor creates globalWorker
  5. On Executor creates globalEndpoint for RPC (using listener address for driver) and sends globalWorker workerAddress.
  6. On driver when it got RPC message it introduces executor to already joined executors + send to joined executor addresses of other cluster members.
  7. On Executor creates ThreadLocal worker pool, that makes sure we're using worker per thread model.
  8. Releases all resources on exit.