oap-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Apache License 2.0
308 stars 68 forks source link

[RFC] Fractional resource scheduling (CPU) #259

Closed pang-wu closed 2 years ago

pang-wu commented 2 years ago

Problem Statement

Ray support fractional resource scheduling for actors, you can put resources less than 1 to improve the utilization of physical resources. Currently RayDP map Spark executor cores straight to actor CPU, which makes Spark users unable to leverage Ray's advance scheduling scheme. We also want user able to schedule on GPU machines.

Why Not Custom Resource Scheduling on Spark?

Spark has custom resource scheduling, which, RayDP has integration with. However, Spark's custom resource does not support fractional scheduling: You can not set something like spark.executor.resource.CPU.amount=0.1

Purpose Solution

Add config spark.ray.actor.resource.* where * potentially could be anything. Currently only cpu and gpu are supported as a special case (because case sensitiveness) in this PR.

Example usage:

ray.init(num_cpus=2)

# Allocate 10 Spark executors, which in total will only occupy 1 vCore in Ray's perspective.
spark = raydp.init_spark(app_name="test_cpu_fraction",
                         num_executors=10, executor_cores=1, executor_memory="500 M",
                         configs={"spark.ray.actor.resource.cpu": "0.1"})