oap-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Apache License 2.0
308 stars 68 forks source link

Issue with Running Multiple Spark Jobs Concurrently on Ray Cluster #386

Closed DimitarSirakov closed 10 months ago

DimitarSirakov commented 11 months ago

I am attempting to execute multiple Spark jobs concurrently on a Ray cluster but I am encountering an error. Despite providing a unique application name for each job, I observe that the cluster creates a RayDPSparkMaster with the unique name as expected, but the org.apache.spark.deploy.raydp.RayAppMaster is always instantiated with the name RAY_APP_MASTER. Similarly, the org.apache.spark.executor.RayDPExecutor instances are always named sequentially as raydp-executor-0, raydp-executor-1, and so on.

Here's the error message that I encounter:

java.lang.IllegalArgumentException: Actor of name RAY_APP_MASTER exists
    at io.ray.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)

Upon examining the library code, I discovered that RAY_APP_MASTER is hardcoded for the org.apache.spark.deploy.raydp.RayAppMaster class (see code here). This discovery prompts me to question whether RayDP is structured to handle only a single task at a time per Ray cluster. If this isn’t the case, is there a known workaround to enable the concurrent execution of multiple Spark jobs without encountering this error? Could there be a configuration setting I may have overlooked, or is this behavior stemming from a current design limitation within RayDP?

Steps to Reproduce:

Initialize a Ray cluster. Concurrently submit multiple Spark jobs with unique application names to the Ray cluster. Expected Behavior: All Spark jobs should execute concurrently without any interference.

Actual Behavior: An error of java.lang.IllegalArgumentException is thrown, indicating a naming conflict for RAY_APP_MASTER.

Environment Details:

Ray version: 2.7.0 RayDP version: 1.6.0

kira-lin commented 11 months ago

Hi @DimitarSirakov , You can create multiple sessions by using the same Spark url. Or you can use different namespaces, you can have actor with the same name in different Ray namespaces.