Open chenya-zhang opened 1 year ago
I suggest you can start with a Ray image, install RayDP and pySpark in it and make sure Spark on Ray works on your K8s cluster first. Then you can try to run with other Spark plugins. If the plugin can work by setting a Spark configuration to include the jar, you can set it in RayDP when you init Spark like raydp.init_spark(..., configs={"key": "value"})
.
However, we have no experience with spark-rapids. If it doesn't work, you may also want to check with the rapids team.
Related discussion in https://github.com/NVIDIA/spark-rapids/discussions/8062
Hey there!
We are trying to experiment Spark on Ray with RAPIDS but not sure if Spark on Ray can support this case.
Here is the example Dockerfile for spark-rapids k8s setup: https://nvidia.github.io/spark-rapids/docs/get-started/Dockerfile.cuda
In the Dockerfile, we find the below commands to copy items under
spark/
:If running
pip install raydp-nightly
, we can findpyspark/
. Underpyspark
, it has the below content.In this case, will there be concerns if we instead
COPY pyspark/jars /opt/pyspark/jars
or setSPARK_HOME
to the existing.../pyspark
installed by RayDP, 2) there is no/kubernetes/dockerfiles/spark/entrypoint.sh
or/kubernetes/tests
underpyspark/
- I think they may not be required if we are able to launch Spark with RayDP on k8s.Any suggestions or pointers would be very helpful, thanks!