openucx / sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
https://www.sparkucx.org/
BSD 3-Clause "New" or "Revised" License
44 stars 31 forks source link

[CORE] UcxShuffleManager implementation. #14

Closed petro-rudenko closed 4 years ago

petro-rudenko commented 4 years ago

UcxShuffleManager - main entry point to the plugin. All logic would be implemented in 3 callbacks:

  1. registerShuffle - called on a driver. Indicates that this job would have a shuffle. Driver allocates metadata buffer, so mappers can publish data and index file addresses and keys.
  2. shuffleBlockResolver - called on mapper, so it'll mmap index and data file and publish it's addresses to driver.
  3. getReader - called on a reducer. All the logic to get shuffle blocks.