volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
3.97k stars 918 forks source link

Add the Apache Uniffle (incubating) as a part of Ecosystem #3046

Open jerqi opened 11 months ago

jerqi commented 11 months ago

What would you like to be added:

Add the Apache Uniffle (incubating) as a part of Ecosystem. Our repo is https://github.com/apache/incubator-uniffle More details https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era

Why is this needed:

Apache Uniffle (incubating) is a general remote shuffle service. It can benefit the jobs using volcano in these situations.

  1. For environments, all of Spark, MR, Tez lack a stand-alone shuffle service, which can lead to potential data loss. Uniffle can help enhance the stability of Spark, MR/Tez in such cloud-native environments.
  2. For large Shuffle operations with severe random IO, Uniffle can improve the performance and stability of large Shuffle jobs by aggregating small Shuffle data from upstream map tasks, effectively transforming random IO into sequential IO.
  3. For the separation of compute and storage, Uniffle can reduce the disk dependency of compute node.
Monokaix commented 2 months ago

Welcome! Can you describe how to integrate them?