Upgrading to Ray 2.3.0 causing cloudpickle errors.
self._set_up_master(resources=self._get_master_resources(configs), kwargs=None)
File "/home/ray/anaconda3/lib/python3.9/site-packages/raydp/spark/ray_cluster.py", line 58, in _set_up_master
ray.get(self._spark_master_handle.start_up.remote())
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class_name: RayDPSparkMaster
actor_id: 133a1106ded55a2df7cccc5305000000
pid: 7969
name: spark-test_SPARK_MASTER
namespace: 85dc7695-b493-44e6-acaf-71a164375d2c
ip: 20.128.3.205
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker exits unexpectedly. Worker exits with an exit code None. The worker may have exceeded K8s pod memory limits.
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.9/site-packages/raydp/spark/ray_cluster_master.py", line 56, in start_up
self._gateway.jvm.org.apache.spark.deploy.raydp.RayAppMaster.setProperties(jvm_properties)
File "/home/ray/anaconda3/lib/python3.9/site-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/home/ray/anaconda3/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.deploy.raydp.RayAppMaster.setProperties.
: java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:460)
at java.util.Properties.setProperty(Properties.java:166)
at java.lang.System.setProperty(System.java:812)
at org.apache.spark.deploy.raydp.RayAppMaster$.$anonfun$setProperties$1(RayAppMaster.scala:336)
at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:400)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at org.apache.spark.deploy.raydp.RayAppMaster$.setProperties(RayAppMaster.scala:335)
at org.apache.spark.deploy.raydp.RayAppMaster.setProperties(RayAppMaster.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
During handling of the above exception, another exception occurred:
ray::PySparkApp.__init__() (pid=7774, ip=20.128.3.205, repr=<__main__.PySparkApp object at 0x7f09c003eaf0>)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/exceptions.py", line 32, in to_bytes
serialized_exception=pickle.dumps(self),
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 88, in dumps
cp.dump(obj)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 733, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.RLock' object
An unexpected internal error occurred while the worker was executing a task.
This is produced when using the example in the Readme of the RayDP. I believe this is caused by Ray Core, but interested if others are experiencing the same issues with the upgrade. I also tested Python 3.9.
Environment Used:
rayproject/ray:2.3.0-py310
Java:
openjdk version "1.8.0_362"
OpenJDK Runtime Environment (build 1.8.0_312-8u3322-ga-0ubuntu1~20.04.1-b09)
OpenJDK 64-Bit Server VM (build 25.362-b09, mixed mode)
@kira-lin curious if you had any thoughts on this. Thanks in advance.
Upgrading to Ray 2.3.0 causing cloudpickle errors.
This is produced when using the example in the Readme of the RayDP. I believe this is caused by Ray Core, but interested if others are experiencing the same issues with the upgrade. I also tested Python 3.9.
Environment Used:
@kira-lin curious if you had any thoughts on this. Thanks in advance.