kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.8k stars 1.38k forks source link

[BUG] Pod creation creation fails on submission with invalid resource quantities #2199

Closed Cian911 closed 2 weeks ago

Cian911 commented 1 month ago

Description

I've been scratching my head on this one for the past few days - without any resolution.

I am in the process of testing migrating the spark operator from spark-operator-chart-1.4.6 to v2.0.1 and have come across the following issues. It seems that submission fails at the point it tries to create a driver pod - with the following error around resource quantities:

Failure executing: POST at: https://127.0.01:443/api/v1/namespaces/spark-operator/pods.
      Message: Pod in version \"v1\" cannot be handled as a Pod: quantities must match
      the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'.

Below is the full error log.

status:
  applicationState:
    errorMessage: "failed to run spark-submit: failed to run spark-submit: 24/09/27
      14:55:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your
      platform... using builtin-java classes where applicable\n24/09/27 14:55:49 INFO
      SparkKubernetesClientFactory: Auto-configuring K8S client using current context
      from users K8S config file\n24/09/27 14:55:50 INFO KerberosConfDriverFeatureStep:
      You have not specified a krb5.conf file locally or via a ConfigMap. Make sure
      that you have the krb5.conf locally on the driver image.\n24/09/27 14:55:50
      ERROR Client: Please check \"kubectl auth can-i create pod\" first. It should
      be yes.\nException in thread \"main\" io.fabric8.kubernetes.client.KubernetesClientException:
      Failure executing: POST at: https://127.0.01:443/api/v1/namespaces/spark-operator/pods.
      Message: Pod in version \"v1\" cannot be handled as a Pod: quantities must match
      the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'. Received
      status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=Pod
      in version \"v1\" cannot be handled as a Pod: quantities must match the regular
      expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', metadata=ListMeta(_continue=null,
      remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}),
      reason=BadRequest, status=Failure, additionalProperties={}).\n\tat io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:518)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:340)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:703)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:92)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1108)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:92)\n\tat
      org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)\n\tat
      org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6(KubernetesClientApplication.scala:256)\n\tat
      org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6$adapted(KubernetesClientApplication.scala:250)\n\tat
      org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)\n\tat
      org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)\n\tat
      org.apache.spark.util.Utils$.tryWithResource(Utils.scala:94)\n\tat org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:250)\n\tat
      org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:223)\n\tat
      org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)\n\tat
      org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)\n\tat
      org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)\n\tat org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)\n\tat
      org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)\n\tat
      org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)\n\tat org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)\nCaused
      by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing:
      POST at: https://127.0.0.1:443/api/v1/namespaces/spark-operator/pods.
      Message: Pod in version \"v1\" cannot be handled as a Pod: quantities must match
      the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'. Received
      status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=Pod
      in version \"v1\" cannot be handled as a Pod: quantities must match the regular
      expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', metadata=ListMeta(_continue=null,
      remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}),
      reason=BadRequest, status=Failure, additionalProperties={}).\n\tat io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:671)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:651)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.OperationSupport.assertResponseCode(OperationSupport.java:600)\n\tat
      io.fabric8.kubernetes.client.dsl.internal.OperationSupport.lambda$handleResponse$0(OperationSupport.java:560)\n\tat
      java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source)\n\tat
      java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)\n\tat
      java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)\n\tat
      io.fabric8.kubernetes.client.http.StandardHttpClient.lambda$completeOrCancel$10(StandardHttpClient.java:140)\n\tat
      java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)\n\tat
      java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
      Source)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
      Source)\n\tat java.base/java.util.concurrent.CompletableFuture.complete(Unknown
      Source)\n\tat io.fabric8.kubernetes.client.http.ByteArrayBodyHandler.onBodyDone(ByteArrayBodyHandler.java:52)\n\tat
      java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)\n\tat
      java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
      Source)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
      Source)\n\tat java.base/java.util.concurrent.CompletableFuture.complete(Unknown
      Source)\n\tat io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody.doConsume(OkHttpClientImpl.java:137)\n\tat
      java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\n\tat
      java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\n\tat
      java.base/java.lang.Thread.run(Unknown Source)\n24/09/27 14:55:50 INFO ShutdownHookManager:
      Shutdown hook called\n24/09/27 14:55:50 INFO ShutdownHookManager: Deleting directory
      /tmp/spark-2fe1d114-2f30-44b5-9a62-89db1478492f\n"
    state: FAILED

First thing to note on this log line: ERROR Client: Please check \"kubectl auth can-i create pod\" first. It should be yes. - the CR is using a serviceAccount that does have the appropriate permissions to perform full CRUD operations to the pods resource - just to rule that out before anyone asks.

There is no change I made to the resource values compared to spark-operator-chart-1.4.6 and v2.0.1. My driver & executor resource asks essentially look like this:

driver:
    cores: 2
    coreLimit: 8124m
    memory: 6123m
 executor:
    cores: 2
    coreLimit: 8124m
    memory: 4123m
    instances: 2

After enabling debug logs on the operator-controller, I can see that these values are correctly passed in and submitted as --conf arguments, but it fails directly after that.

This smells to me that it is an issue with spark:3.5.1.. But I am not entirely sure. I will post the full SparkApplication below for reference.

Reproduction Code [Required]

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: cian-test
  namespace: spark-operator
spec:
  driver:
    annotations:
      ad.datadoghq.com/spark-kubernetes-driver.check_names: '["prometheus"]'
      ad.datadoghq.com/spark-kubernetes-driver.init_configs: '[{}]'
      ad.datadoghq.com/spark-kubernetes-driver.instances: "\n[\n  {\n    \"prometheus_url\":
        \"http://%%host%%:8090/metrics\",\n    \"namespace\": \"spark-operator\",\n
        \   \"metrics\": [\"*\"],\n    \"tags\": []\n  }\n]\n        "
     cores: 2
    coreLimit: 8124m
    memory: 6123m
    javaOptions: -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -Dlog4j.configuration=file:/opt/log4j.properties
    nodeSelector:
      node-type: node-ssd
    podSecurityContext:
      fsGroup: 185
    serviceAccount: spark-operator
    tolerations:
    - effect: NoSchedule
      key: compute/nodegroup
      operator: Equal
      value: node-ssd
    volumeMounts:
    - mountPath: /data/spark/temp
      name: spark-data
    - mountPath: /var/lib/containerd/spark
      name: spark-local-dir-nvme
  executor:
    annotations:
      ad.datadoghq.com/spark-kubernetes-executor.check_names: '["prometheus"]'
      ad.datadoghq.com/spark-kubernetes-executor.init_configs: '[{}]'
      ad.datadoghq.com/spark-kubernetes-executor.instances: "\n[\n  {\n    \"prometheus_url\":
        \"http://%%host%%:8090/metrics\",\n    \"namespace\": \"spark-operator\",\n
        \   \"metrics\": [\"*\"],\n    \"tags\": []\n  }\n]\n        "
    cores: 2
    coreLimit: 8124m
    memory: 4123m
    instances: 2
    javaOptions: -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -Dlog4j.configuration=file:/opt/log4j.properties
    nodeSelector:
      node-type: node-ssd
    podSecurityContext:
      fsGroup: 185
    serviceAccount: spark-operator
    tolerations:
    - effect: NoSchedule
      key: compute/nodegroup
      operator: Equal
      value: node-ssd
    volumeMounts:
    - mountPath: /data/spark/temp
      name: spark-data
    - mountPath: /var/lib/containerd/spark
      name: spark-local-dir-nvme
  image: my-custom-image:v1
  mainApplicationFile: s3a://my-bucket/my-jar.jar
  mainClass: com.myClass.Cian.Application
  mode: cluster
  monitoring:
    exposeDriverMetrics: true
    exposeExecutorMetrics: true
    prometheus:
      jmxExporterJar: /opt/spark/jars/jmx_prometheus_javaagent-0.11.0.jar
      port: 8090
  restartPolicy:
    type: Never
  sparkConf:
    spark.decommission.enabled: "true"
    spark.dynamicAllocation.shuffleTracking.enabled: "true"
    spark.eventLog.dir: s3a://my-s3-bucket/logs
    spark.eventLog.enabled: "true"
    # spark.kubernetes.memoryOverheadFactor: "0.1"
    spark.storage.decommission.enabled: "true"
    spark.storage.decommission.rddBlocks.enabled: "true"
    spark.storage.decommission.shuffleBlocks.enabled: "true"
  sparkUIOptions:
    servicePort: 4040
    servicePortName: spark-driver-ui-port
    serviceType: ""
  sparkVersion: 3.4.1
  timeToLiveSeconds: 3600
  type: Scala
  volumes:
  - name: api-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 31536000
          path: token
  - name: spark-data
    persistentVolumeClaim:
      claimName: spark-operator-efs-pvc
  - emptyDir: {}
    name: spark-local-dir-nvme

Expected behavior

Driver & Executor pods should spin up and job should start.

Actual behavior

Job submission fails.

Terminal Output Screenshot(s)

Environment & Versions

Additional context

cc: @ChenYi015 @jacobsalway

jacobsalway commented 1 month ago

Hey @Cian911 I'm not able to reproduce this locally so far with those values. From experience you get this error if coreRequest or coreLimit don't conform to the Kubernetes resource syntax. Do you have any mutating webhooks on the cluster that might mutate the request or limit fields on pod creation?

ChenYi015 commented 2 weeks ago

@Cian911 I can see that when emptyDir sizeLimit is nil, the args --conf spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-nvme.options.sizeLimit=<nil> will be added in spark-submit, which may fail the submission. Would you like to retry by giving the sizeLimit a specific value?

Cian911 commented 2 weeks ago

@ChenYi015 Bingo - I think this is it.

Myself and the team actually managed to fix the issue, exactly here, but by just changing the name of the emptyDir:

- emptyDir: {}
    name: spark-local-dir-nvme

to:

- emptyDir: {}
    name: local-nvme

I thought it was a problem caused by the SparkLocalDirPrefix value.

This looks like a better solution. Much appreciated for following up @ChenYi015 !

ChenYi015 commented 2 weeks ago

That's good. For volumes that are not prefixed with spark-local-dir-, the volumeMounts will be patched by the webhook server. For those that are prefixed with spark-local-dir-, they will be mounted by Spark during spark-submit. And there is a redundant sizeLimit conf with <nil> value if sizeLimit is not specified. I have raised a PR to fix it so users can still use the volume names with local dir prefix.

jacobsalway commented 2 weeks ago

Nice catch @ChenYi015. Apologies @Cian911 I only tested with the request values and not the full application spec including volumes.

Cian911 commented 2 weeks ago

No worries @jacobsalway it was a tricky one to find none the less. The corresponding error was not helpful and led me down the entirely wrong path for quite a while!