Open peter-mcclonski opened 5 months ago
$SPARK_HOME/jars
and the result of java -Divy.cache.dir=$SPARK_HOME -Divy.home=$SPARK_HOME -jar $SPARK_HOME/jars/ivy-2.5.1.jar -dependency [PACKAGE]
. This populated volume shall be mounted in the SHS container as $SPARK_HOME/jars
$SPARK_HOME/conf/spark.conf
shall be mounted as a volume populated by a raw text block in the helm chart.Did some initial work on this just to feel it out-- Got automatic resolution of packages working via initcontainers. It's a bit gross, but it works as a start.
Major TODO items:
spark-defaults.conf
Alternatively-- @yuchaoran2011 Do you think it would be worth reviving https://artifacthub.io/packages/helm/cloudnativeapp/spark-history-server and the associated chart and (potentially) having it live here, adjacent to but disconnected from the actual operator chart? I think the real problem here isn't so much that operator should be managing the history server directly, and more that history server, a valuable part of the spark ecosystem, doesn't have any good helm charts out in the wild. We're working on one as part of boozallen/aissemble#66 (https://github.com/boozallen/aissemble/pull/80/files), covered by our BAPL (not as permissive as, say, Apache) solely because we couldn't find an existing OSS solution that was up to date, maintained, and flexible.
I'm not sure if it's a good idea to have history server co-deployed with operator. A single history server can aggregate jobs managed by multiple Spark operator deployments across multiple k8s clusters
I think the real problem here isn't so much that operator should be managing the history server directly, and more that history server, a valuable part of the spark ecosystem, doesn't have any good helm charts out in the wild.
I agree. I haven't looked at the quality of https://artifacthub.io/packages/helm/cloudnativeapp/spark-history-server, but if it's something you have used, I'm for that idea
I'm not sure if it's a good idea to have history server co-deployed with operator. A single history server can aggregate jobs managed by multiple Spark operator deployments across multiple k8s clusters
I think the real problem here isn't so much that operator should be managing the history server directly, and more that history server, a valuable part of the spark ecosystem, doesn't have any good helm charts out in the wild.
I agree. I haven't looked at the quality of https://artifacthub.io/packages/helm/cloudnativeapp/spark-history-server, but if it's something you have used, I'm for that idea
Sounds reasonable to me. Wrt the helm chart I linked, I wasn't sure if you had specific thoughts, given that you're listed as the maintainer on artifacthub
Ah upon a closer look, now I remember that I initially created this chart many years ago. I haven't used it for a long time though and won't count on it still being production ready
I think there's both interest and clearly an unfilled need in the community for a production ready, standalone spark history chart that's well maintained. Would kubeflow and the spark operator maintainers be open to one being created in this repo, or would it be better housed somewhere totally separate?
Kindly make the spark history server part of the operator. I think targeting this operator as single point for spark on K8s eco system will add much better momentum for the development.
For example integrating spark operator to manage an external shuffle service on K8s.
Sorry for interrupting but I am so excited about the new development on this operator
I am also looking forward to a well maintained helm chart for spark history server, and I think maybe spark operator
repo is the best place to host this chart. Would you @yuchaoran2011 mind me contributing a new helm chart based on this one https://artifacthub.io/packages/helm/cloudnativeapp/spark-history-server and put it under charts/spark-history-server
.
I noticed that @vara-bonthu had maintained one helm chart for spark history server https://github.com/KubedAI/spark-history-server with support for S3. And I want to know what do you think about creating a new one for history server in this repo?
Spark History Server isn't directly tied to the Spark Operator project. It's usually deployed by users on Kubernetes, even if they don't use the Spark Operator. For example, users running spark-submit
without the operator often set up the Spark History Server on their own. This is a separate deployment and, for large workloads, might need multiple replicas. So, it doesn't make sense to link it directly to the Spark Operator project.
If the community is interested, we could propose making the Spark History Server its own project. This could be under Kubeflow or Apache, focusing on multi-cloud and self-managed setups.
This PR can be moved to a new repo.
Community Note
What is the outcome that you are trying to reach?
The Spark History server is a valuable debugging and process tracing tool. Currently, deployment of the history server would have to occur independently from the operator. It would be a convenience to manage the Spark History Server (SHS) via the Spark Operator helm chart.
Describe the solution you would like
A new section shall be added to the spark operator helm chart to define parameters for the SHS deployment. We note that a confounding element of this feature is storage layers. SHS is dependent on some accessible storage layer where spark logs can be found. The simplest implementation is a shared NFS volume, but blob storage such as S3 or an Azure storage account are common solutions that should be easy to use with our implementation. These third party solutions require additional libraries to be loaded into the classpath-- a task that SHS fails to trivialize.
Describe alternatives you have considered
The alternative involves individuals rolling their own deployments for SHS-- a non-trivial process.
Additional context
If we choose to pursue this, we may also wish to consider managing deployment of the Hive Thrift Server.