radanalyticsio / spark-operator

Operator for managing the Spark clusters on Kubernetes and OpenShift.
Apache License 2.0
157 stars 61 forks source link

Upgrading Spark, using custom libraries #274

Closed Sadagopan88 closed 4 years ago

Sadagopan88 commented 4 years ago

This is more of a question

Description:

I need to upgrade the spark version to 2.4.4. How can I do it? I need to use custom libraries in the spark classpath. how can I do that?

jkremser commented 4 years ago

I need to upgrade the spark version to 2.4.4. How can I do it?

Unfortunately, this isn't trivial as it should be. Operator has certain assumption about the container image it is able to deploy as Spark worker/master. These can be verified w/ this tool. In general it is guaranteed that it will run smoothly w/ images from radanalytics.io - https://github.com/radanalyticsio/openshift-spark and luckily enough there was a release recently of 2.4.4, perhaps @tmckayus would know more if it works well / was tested with the operator (clusters, apps, history server).

I need to use custom libraries in the spark classpath. how can I do that?

a) You can either "bake" your jars/zips/eggs/files directly in your container images that are compliant w/ the operator (you can for instance take the radanalytics.io image as a base image to achieve this).

b) Or for testing/developing/prototyping purposes you can use this feature that uses init-containers that pulls the dependencies from Maven repo: example

you can also download arbitrary files from the internets using this feature, however it's a bit limited because it can't download it to a arbitrary location on the image (/tmp works just fine though)