oap-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Apache License 2.0
315 stars 69 forks source link

raydp.init_spark fails #350

Open wctmanager opened 1 year ago

wctmanager commented 1 year ago

Created docker image as described at https://github.com/oap-project/raydp/tree/master/docker the only change is it's based on rayproject/ray:latest-py38 (on py38 and not the default py37). Created image was deployed with helm charts described https://docs.ray.io/en/latest/cluster/kubernetes/getting-started.html#kuberay-quickstart. I use Azure Kubernetes Service (AKS) and access my k8s cluster there remotely.

Then import ray import raydp ray.init("ray://x.x.x.x:10001") goes fine and connects to the ray cluster but spark = raydp.init_spark(app_name='RayDP Example', num_executors=1, executor_cores=1, executor_memory='1G') creates Traceback (most recent call last): File "python/ray/_raylet.pyx", line 870, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 921, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 877, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 881, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/function_manager.py", line 670, in actor_method_executor return method(__ray_actor, *args, *kwargs) File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 460, in _resume_span return method(self, _args, **_kwargs) File "/opt/conda/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 460, in _resume_span File "/opt/conda/lib/python3.8/site-packages/raydp/spark/ray_cluster_master.py", line 56, in start_up File "/home/ray/anaconda3/lib/python3.8/site-packages/py4j/java_gateway.py", line 1321, in call return_value = get_return_value( File "/home/ray/anaconda3/lib/python3.8/site-packages/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.deploy.raydp.RayAppMaster.setProperties. : java.lang.NullPointerException at java.util.Hashtable.put(Hashtable.java:460) at java.util.Properties.setProperty(Properties.java:166) at java.lang.System.setProperty(System.java:812) at org.apache.spark.deploy.raydp.RayAppMaster$.$anonfun$setProperties$1(RayAppMaster.scala:336) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:400) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728) at org.apache.spark.deploy.raydp.RayAppMaster$.setProperties(RayAppMaster.scala:335) at org.apache.spark.deploy.raydp.RayAppMaster.setProperties(RayAppMaster.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:750)

Any ideas? Thank you very much.

pang-wu commented 1 year ago

What version of ray and raydp you are using?

wctmanager commented 1 year ago

I tried to run it with docker image built from ray:2.4.0 (currently latest) and 2.2.0 both for py38. raydp used in the image is as in the current Dockerfile file - latest which is currently 1.5.0. Same versions were used on the client side. Thank you for your help.

wctmanager commented 1 year ago

It looks like that raydp v1.5.0 is based on ray 2.1.0 (in core/raydp-main/pom.xml), so I tried to build an image with ray:2.1.0. Then raydp.init_spark works.