kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.81k stars 1.38k forks source link

[QUESTION] Spark-Operator deprecation on envVars #2137

Closed JeanMichelApeupres closed 3 months ago

JeanMichelApeupres commented 3 months ago

Hi,

I'm part of a team that deployed Spark Operator (v1beta2-1.6.1-3.5.0) on a highly secured internally managed Kubernetes cluster where webhooks are completely forbidden (amongst other things).

We're relying heavily on envVars for our Spark Application deployments and I was wondering :

In the case of a removal, is there any workaround or other method implemented (knowing that webhooks aren't an option) ?

Thanks !

ChenYi015 commented 3 months ago

@JeanMichelApeupres envVars was not marked as deprecated, this feature still works even without webhook enabled.

ChenYi015 commented 3 months ago

The webhook is enabled by default and cannot be disabled in the 2.0.0-rc.0 version. It serves to default/validate SparkApplication/ScheduledSparkApplication and mutate Spark pods. If you disable the webhook server, many k8s features will not work. However, if this is a requirement for your case, we can consider reintroducing the option webhook.enable.

JeanMichelApeupres commented 3 months ago

@ChenYi015 I'm a bit confused, I saw the note here saying "Note: legacy field envVars that can also be used for specifying environment variables is deprecated and will be removed in a future API version.", is the documentation up-to-date or am I looking at the right documentation ?

Regarding the webhook.enable, yes if possible I would like it to be reintroduced, even with a limited set of features

ChenYi015 commented 3 months ago

I saw the note here saying "Note: legacy field envVars that can also be used for specifying environment variables is deprecated and will be removed in a future API version.", is the documentation up-to-date or am I looking at the right documentation ?

@JeanMichelApeupres Sorry, I did not notice that. The field envVars is marked as deprecated, as environment variables can be specified by using spec.[driver|executor].env and spec.[driver|executor].envFrom in a more universal way when the webhook is enabled. It is likely to be removed in the next API version (maybe v1beta3). But don't worry, we are introducing pod template feature for Spark 3.x applications, which will allow you to define environment variables without relying on the webhook.

JeanMichelApeupres commented 3 months ago

@ChenYi015 Looking forward for the pod template feature ! This should help us customize ephemeral storage as well. Will there be an issue/PR for the webhook.enable reintroduction (if validated) ?

ChenYi015 commented 3 months ago

Will there be an issue/PR for the webhook.enable reintroduction (if validated) ?

@JeanMichelApeupres Yes, I have already raised a PR #2142 to reintroduce the option webhook.enable.