kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.79k stars 1.37k forks source link

Allow folder patterns for JARs into Java classpath #1201

Open Jimmy-Newtron opened 3 years ago

Jimmy-Newtron commented 3 years ago

It is not the first time that I encounter this problem

Suppose that you have the following

/opt
   |- spark/libs/*.jar
   |- customLibs/*.jar

You can add your customLibs while executing the spark-submit via the --jars parameter

spark-submit --master local[*] --jars /opt/customLibs/*.jar --mainClass MyApp application.jar

The syntax with globPattern work in CLI, but when using the spark-operator we cannot have the same behavior

spec:
  deps:
    jars:
      - local:///opt/customLibs/*.jar 

Details

In fact spark-operator builds the spark.property file with the following

"spark.jars": "local\:\/\/\/opt\/customLibs\/*.jar"

spark is able to parse the configuration of patterns only when it is relative to the file system like

"spark.jars": "\/opt\/customLibs\/*.jar"

Attempted to bypass the problem

Instead of passing the jars with the expected yaml property I have tried to provide the following

spec:
  sparkConf:
    "spark.jars": "/opt/customLibs/*.jar"

and as you can imagine the spark.jars key has been overridden by the spark-operator, removing my definition (no merge or concatenation)

cartermckinnon commented 3 years ago

The last version of the operator that this worked on (for me) is v1beta2-1.1.2-2.4.5. So, I figured I'd just use the old version for now. My application images are custom and using Spark 3.1.2, and the Spark version used to spark-submit the job is pretty flexible in my experience. The accompanying chart version (0.7.2) is in the incubator repo and I can't seem to pull it:

$ helm pull incubator/sparkoperator --version 0.7.2
Error: failed to fetch https://kubernetes-charts-incubator.storage.googleapis.com/sparkoperator-0.7.2.tgz : 404 Not Found

In fact, I get this 404 for all versions of the incubator chart. I'm sure this is user error, but I didn't want to go down a Helm rabbit-hole.

So, I decided to try using this older version of the operator with the current chart, via the image.tag value:

$ helm install myrelease spark-operator/spark-operator --set image.tag=v1beta2-1.1.2-2.4.5

There are two command-line arguments to the operator in the current chart that are not valid for this older version:

I chose to just vendor the chart and comment these arguments out in the Deployment manifest; but you can also patch the Deployment yourself after installation.

It's not pretty, definitely not supported, and may not work with any other chart + operator version combo. But, it got me going and it's a pretty simple hack.

To the maintainers, is there anything I can do to help get a real fix merged?

quentingodeau commented 1 month ago

Greetings! Any news regarding this issue? Regards, Quentin