kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.7k stars 1.35k forks source link

[QUESTION] Error related with webhook #2032

Open alstjs37 opened 1 month ago

alstjs37 commented 1 month ago

Hello,

I've just encountered an error like this,

When I first installed the spark-operator and run pyspark-pi.py without giving the --set webhook.enable=true option, I checked it worked well.

After that, in order to mount the volume, I removed the spark-operator with helm uninstall and reinstalled it again by giving the --set webhook.enable=ture option, but pyspark-pi does not work now.

My pods in spark-operator namespace

$ sudo kubectl get all -n spark-operator

NAME                                                  READY   STATUS      RESTARTS   AGE
pod/sparkoperator-spark-operator-6994c8bcfd-vns8k     1/1     Running     0          137m
pod/sparkoperator-spark-operator-webhook-init-ww2lw   0/1     Completed   0          137m

NAME                                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/sparkoperator-spark-operator-webhook   ClusterIP   10.107.69.123   <none>        443/TCP   137m

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sparkoperator-spark-operator   1/1     1            1           137m

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/sparkoperator-spark-operator-6994c8bcfd   1         1         1       137m

NAME                                                  STATUS     COMPLETIONS   DURATION   AGE
job.batch/sparkoperator-spark-operator-webhook-init   Complete   1/1           3s         137m

when i apply yaml file to k8s with below file, i got SUBMISSION_FAILED from sparkapplication ...

here is my pyspark-pi.yaml

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: pyspark-pi
  namespace: spark-operator
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: "msleedockerhub/spark-py:py3.0"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.5.1"
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.5.1
    serviceAccount: sparkoperator-spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.5.1

I'm sure there's no problem with the image I created. If you set up the webhook, what else should I set up in the yaml file?

How can i solve this problem? plz help

imtzer commented 1 month ago

@alstjs37 Can you use kubectl describe <your-sparkapp> and provide the output? And if pod was created, check pod log too

alstjs37 commented 1 month ago

@imtzer thx for you answer

I've already checked, but this is the part of the log that contains the starting point of the error.

Status:
  Application State:
    Error Message:  failed to run spark-submit for SparkApplication spark-operator/pyspark-pi: 24/05/21 04:51:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/05/21 04:51:06 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
24/05/21 04:51:07 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
24/05/21 04:51:07 WARN DriverCommandFeatureStep: spark.kubernetes.pyspark.pythonVersion was deprecated in Spark 3.1. Please set 'spark.pyspark.python' and 'spark.pyspark.driver.python' configurations or PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables instead.
24/05/21 04:51:48 ERROR Client: Please check "kubectl auth can-i create pod" first. It should be yes.
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
  at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)

I checked Please check "kubectl auth can-i create pod" first this command, but i got yes from k8s.

And then there's a SUBMISSION_FAILED, so pod is not generated 😭

Do you expect anything else?

imtzer commented 1 month ago

kubectl auth can-i create pod

The output error kubectl auth can-i create pod is throwed in Spark repo KubernetesClientApplication.scala file when using KubernetesClient API to create driver pod

jcunhafonte commented 1 week ago

I'm facing the same issue with the version v1beta2-1.4.3-3.5.0. @alstjs37 Were you able to fix this issue?