I am attempting to execute multiple Spark jobs concurrently on a Ray cluster but I am encountering an error. Despite providing a unique application name for each job, I observe that the cluster creates a RayDPSparkMaster with the unique name as expected, but the org.apache.spark.deploy.raydp.RayAppMaster is always instantiated with the name RAY_APP_MASTER. Similarly, the org.apache.spark.executor.RayDPExecutor instances are always named sequentially as raydp-executor-0, raydp-executor-1, and so on.
Here's the error message that I encounter:
java.lang.IllegalArgumentException: Actor of name RAY_APP_MASTER exists
at io.ray.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
Upon examining the library code, I discovered that RAY_APP_MASTER is hardcoded for the org.apache.spark.deploy.raydp.RayAppMaster class (see code here). This discovery prompts me to question whether RayDP is structured to handle only a single task at a time per Ray cluster. If this isn’t the case, is there a known workaround to enable the concurrent execution of multiple Spark jobs without encountering this error? Could there be a configuration setting I may have overlooked, or is this behavior stemming from a current design limitation within RayDP?
Steps to Reproduce:
Initialize a Ray cluster.
Concurrently submit multiple Spark jobs with unique application names to the Ray cluster.
Expected Behavior:
All Spark jobs should execute concurrently without any interference.
Actual Behavior:
An error of java.lang.IllegalArgumentException is thrown, indicating a naming conflict for RAY_APP_MASTER.
Hi @DimitarSirakov ,
You can create multiple sessions by using the same Spark url. Or you can use different namespaces, you can have actor with the same name in different Ray namespaces.
I am attempting to execute multiple Spark jobs concurrently on a Ray cluster but I am encountering an error. Despite providing a unique application name for each job, I observe that the cluster creates a RayDPSparkMaster with the unique name as expected, but the
org.apache.spark.deploy.raydp.RayAppMaster
is always instantiated with the nameRAY_APP_MASTER
. Similarly, theorg.apache.spark.executor.RayDPExecutor
instances are always named sequentially asraydp-executor-0
,raydp-executor-1
, and so on.Here's the error message that I encounter:
Upon examining the library code, I discovered that RAY_APP_MASTER is hardcoded for the org.apache.spark.deploy.raydp.RayAppMaster class (see code here). This discovery prompts me to question whether RayDP is structured to handle only a single task at a time per Ray cluster. If this isn’t the case, is there a known workaround to enable the concurrent execution of multiple Spark jobs without encountering this error? Could there be a configuration setting I may have overlooked, or is this behavior stemming from a current design limitation within RayDP?
Steps to Reproduce:
Initialize a Ray cluster. Concurrently submit multiple Spark jobs with unique application names to the Ray cluster. Expected Behavior: All Spark jobs should execute concurrently without any interference.
Actual Behavior: An error of java.lang.IllegalArgumentException is thrown, indicating a naming conflict for RAY_APP_MASTER.
Environment Details:
Ray version: 2.7.0 RayDP version: 1.6.0