oap-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Apache License 2.0
315 stars 69 forks source link

Can't start raydp when ray head node is not the same as the raydp node #226

Closed tdeboer-ilmn closed 1 year ago

tdeboer-ilmn commented 2 years ago

I am trying to setup raydp on my ray cluster, but I am creating the ray client like this

import ray, raydp
ray.init(address='ray://10.112.80.176:10001')
spark = raydp.init_spark(app_name='RayDP Example',
                         num_executors=2,
                         executor_cores=2,
                         executor_memory='4GB'
                     )

But this results in these errors

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 spark = raydp.init_spark(app_name='RayDP Example',
      2                          num_executors=2,
      3                          executor_cores=2,
      4                          executor_memory='4GB'
      5                      )

File ~/raymodin/lib/python3.8/site-packages/raydp/context.py:126, in init_spark(app_name, num_executors, executor_cores, executor_memory, configs)
    123 try:
    124     _global_spark_context = _SparkContext(
    125         app_name, num_executors, executor_cores, executor_memory, configs)
--> 126     return _global_spark_context.get_or_create_session()
    127 except:
    128     _global_spark_context = None

File ~/raymodin/lib/python3.8/site-packages/raydp/context.py:70, in _SparkContext.get_or_create_session(self)
     68     return self._spark_session
     69 self.handle = RayDPConversionHelper.options(name=RAYDP_OBJ_HOLDER_NAME).remote()
---> 70 spark_cluster = self._get_or_create_spark_cluster()
     71 self._spark_session = spark_cluster.get_spark_session(
     72     self._app_name,
     73     self._num_executors,
     74     self._executor_cores,
     75     self._executor_memory,
     76     self._configs)
     77 return self._spark_session

File ~/raymodin/lib/python3.8/site-packages/raydp/context.py:63, in _SparkContext._get_or_create_spark_cluster(self)
     61 if self._spark_cluster is not None:
     62     return self._spark_cluster
---> 63 self._spark_cluster = SparkCluster(self._configs)
     64 return self._spark_cluster

File ~/raymodin/lib/python3.8/site-packages/raydp/spark/ray_cluster.py:34, in SparkCluster.__init__(self, configs)
     32 self._app_master_bridge = None
     33 self._configs = configs
---> 34 self._set_up_master(None, None)
     35 self._spark_session: SparkSession = None

File ~/raymodin/lib/python3.8/site-packages/raydp/spark/ray_cluster.py:40, in SparkCluster._set_up_master(self, resources, kwargs)
     37 def _set_up_master(self, resources: Dict[str, float], kwargs: Dict[Any, Any]):
     38     # TODO: specify the app master resource
     39     self._app_master_bridge = RayClusterMaster(self._configs)
---> 40     self._app_master_bridge.start_up()

File ~/raymodin/lib/python3.8/site-packages/raydp/spark/ray_cluster_master.py:56, in RayClusterMaster.start_up(self, popen_kwargs)
     54 self._gateway = self._launch_gateway(extra_classpath, popen_kwargs)
     55 self._app_master_java_bridge = self._gateway.entry_point.getAppMasterBridge()
---> 56 self._set_properties()
     57 self._host = ray.util.get_node_ip_address()
     58 self._create_app_master(extra_classpath)

File ~/raymodin/lib/python3.8/site-packages/raydp/spark/ray_cluster_master.py:145, in RayClusterMaster._set_properties(self)
    142 node = ray.worker.global_worker.node
    144 options["ray.run-mode"] = "CLUSTER"
--> 145 options["ray.node-ip"] = node.node_ip_address
    146 options["ray.address"] = node.redis_address
    147 options["ray.redis.password"] = node.redis_password

AttributeError: 'NoneType' object has no attribute 'node_ip_address'

Seems that it tries to assume that the local machine is the ray server... Is there a way to configure raydp?

kira-lin commented 2 years ago

Hi @tdeboer-ilmn , Glad you tried raydp. You are right, raydp.init_spark is assumed to be called in the ray cluster. If you need to use ray client, for the current stable release, you need to wrap your driver program in a ray actor, so that it can be executed on a node in the ray cluster. If you are willing to try raydp-nightly, then you can use raydp.init_spark on your local machine, and it works fine with ray client. However, to_spark does not work now because ray has not merged my PR.

kira-lin commented 1 year ago

RayDP now works directly in ray client mode. Closing this as stale