Open swapkh91 opened 11 months ago
Hi @swapkh91 , Sorry for the late reply. Ray 2.7.0 is released recently and might not be compatible with RayDP 1.6.0. Can you try Ray 2.6?
Hi @swapkh91 , Sorry for the late reply. Ray 2.7.0 is released recently and might not be compatible with RayDP 1.6.0. Can you try Ray 2.6?
@kira-lin thanks, I'll try and get back. Any limitation on java version?
Sorry for inconvenience. We only tested java 8. Java 11 should be fine though.
@kira-lin I tested it with ray 2.6.2, getting same error. I'll explain how I'm trying to connect, maybe some issue in the process
the ray cluster is on GKE
I have port forwarded it on my laptop through
kubectl port-forward --address 0.0.0.0 service/raycluster-autoscaler-head-svc 10001:10001
I then connect using
ray.init(address="ray://localhost:10001")
spark = raydp.init_spark(app_name='RayDP Example2',
num_executors=2,
executor_cores=2,
executor_memory='4GB')
Now this init_spark
comamnd gives the above error
I checked the logs through dashboard
:job_id:03000000
:actor_name:RayDPSparkMaster
Error opening zip file or JAR manifest missing : /home/swapnesh/.local/lib/python3.10/site-packages/raydp/jars/raydp-agent-1.6.0.jar
Why is it showing the jar file path of my laptop? It is present there though, I checked
Why is it showing the jar file path of my laptop? It is present there though, I checked
Oops, this seems a bug. We'll try to fix this. For now, you can wrap this init_spark and things you want to do with spark in an remote actor, that should be fine. Thanks for identifying this bug.
Why is it showing the jar file path of my laptop? It is present there though, I checked
Oops, this seems a bug. We'll try to fix this. For now, you can wrap this init_spark and things you want to do with spark in an remote actor, that should be fine. Thanks for identifying this bug.
@kira-lin got it, I'll try that. Also, I noticed that raydp has dependency ray >= 2.1.0
as here. So this installs ray 2.7.1
when I do pip install raydp
I then have to manually do pip install --force-reinstall ray==2.6.2
to downgrade
He @swapkh91 I am also getting the same error. Did you find the solution for this?
hi @raiprabh ,
For now, you can wrap this init_spark and things you want to do with spark in an remote actor, that should be fine. Thanks for identifying this bug.
You can try this solution. We don't have enough bandwidth to work on this project now, so you are welcome to submit a PR to fix this if you have a solution @swapkh91 . We just need to use the path of the remote machines.
@kira-lin, Is there any update on this issue?
I also get this error when running the following code:
if __name__ == "__main__":
import ray
import raydp
ray.init(
address="ray://localhost:10001"
)
spark = ray.remote(
raydp.init_spark("NYCTAXI data processing",
num_executors=2,
executor_cores=1,
executor_memory="500M",
configs={"spark.shuffle.service.enabled": "true"})
)
data = ray.remote(
spark.read.format("csv") \
.option("header", "true") \
.option("inferSchema", "true") \
.load(NYC_TRAIN_CSV)
)
Seems that wrapping the functions into ray.remote
doesn't help?
The following worked for me:
import time
import ray
import raydp
import pandas as pd
@ray.remote
class PySparkDriver:
def __init__(self):
self.spark = raydp.init_spark("RayDP Example",
num_executors=2,
executor_cores=1,
executor_memory="1GB")
def foo(self):
return self.spark.range(1000).repartition(10).count()
if __name__ == "__main__":
ray.init(
address="ray://localhost:10001"
)
driver = PySparkDriver.remote()
print(ray.get(driver.foo.remote()))
I'm trying a test using raydp. I have setup Ray Cluster on GKE using the below dockerfile
I have port forwarded the gke pod and I'm able to connect to it using
ray.init(address="ray://localhost:10001")
When i try to connect raydp through
I get the following error
Exception: Java gateway process exited before sending its port number
Full stacktrace
Libraries on my laptop: