kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.79k stars 1.37k forks source link

[BUG] Failed to pull image "gcr.io/spark-operator/spark:v3.1.1" #2012

Open sidi-elwely opened 6 months ago

sidi-elwely commented 6 months ago

Description

Unable to create a spark-application

Steps to reproduce the behavior:

  1. Set up a new kubenetes cluster. I set up one in gcloud.
  2. Get kubenetes cluster config
  3. Install spark-operator by helm
  4. create spark-pi.yaml by using (https://github.com/kubeflow/spark-operator/blob/master/examples/spark-pi.yaml)
  5. Apply the file

Actual behavior

Pod in status Failed because of ImagePullBackOff

Terminal Output Screenshot(s)

Events: Type Reason Age From Message


Normal Scheduled 12s default-scheduler Successfully assigned default/spark-pi-driver to kind-control-plane Warning FailedMount 10s (x3 over 11s) kubelet MountVolume.SetUp failed for volume "spark-conf-volume-driver" : configmap "spark-drv-cfc48c8f3d591c55-conf-map" not found Normal Pulling 4s kubelet Pulling image "gcr.io/spark-operator/spark:v3.1.1" Warning Failed 1s kubelet Failed to pull image "gcr.io/spark-operator/spark:v3.1.1": rpc error: code = NotFound desc = failed to pull and unpack image "gcr.io/spark-operator/spark:v3.1.1": failed to resolve reference "gcr.io/spark-operator/spark:v3.1.1": gcr.io/spark-operator/spark:v3.1.1: not found Warning Failed 1s kubelet Error: ErrImagePull Normal BackOff 1s kubelet Back-off pulling image "gcr.io/spark-operator/spark:v3.1.1" Warning Failed 1s kubelet Error: ImagePullBackOff

Environment & Versions

networkingana commented 6 months ago

I have the same issue, what image should we use?

peter-mcclonski commented 6 months ago

Good morning,

As of #2010, the examples have been updated to reference the official spark image available on dockerhub: spark:3.5.0. Unfortunately, the legacy images are no longer available. Fortunately, the official spark images are fully compatible with this operator.

sidi-elwely commented 6 months ago

Thank you for the information, but also in spar-pi-prometheus we have the same problem, this image: gcr.io/spark-operator/spark:v3.1.0-gcs-prometheus does not work. could you please help us!

peter-mcclonski commented 6 months ago

@sidi-elwely #2010 didn't update the prometheus-enabled image, which currently is not published by any of the CI jobs. I'll defer to a maintainer as to whether this is something worth re-enabling, but I think it's likely to need some rework regardless. Right now the image is tied specifically to GCP, which I'm comfortable saying isn't optimal. The meat of the image WRT prometheus is a single jar and a couple of conf files-- perhaps not worth maintaining as a separate image, but I can imagine a few ways to ease usage.

peter-mcclonski commented 6 months ago

See https://github.com/kubeflow/spark-operator/tree/master/spark-docker if you're interested in creating your own prometheus-enabled image in the mean time.

networkingana commented 4 months ago

After updating the base image to spark 3.5.0 and the spark-operator till the last version i have this error and my apps won't start, also the application version was updated from 3.2.0 to 3.5.0

Files local:///opt/spark-jars/spark-3-rules.yaml from /opt/spark-jars/spark-3-rules.yaml to /opt/spark-jars/spark-3-rules.yaml
2024-07-18 12:44:07.005 WARN  [main            ] org.apache.spark.network.util.JavaUtils:112  - Attempt to delete using native Unix OS command failed for path = /opt/spark-jars/spark-3-rules.yaml. Falling back to Java IO way
java.io.IOException: Failed to delete: /opt/spark-jars/spark-3-rules.yaml
        at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingUnixNative(JavaUtils.java:173) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:109) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:90) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.util.SparkFileUtils.deleteRecursively(SparkFileUtils.scala:121) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.util.SparkFileUtils.deleteRecursively$(SparkFileUtils.scala:120) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1126) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$14(SparkSubmit.scala:437) ~[DataAnalyticsReporting.jar:?]
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) ~[DataAnalyticsReporting.jar:?]
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[DataAnalyticsReporting.jar:?]
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[DataAnalyticsReporting.jar:?]
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[DataAnalyticsReporting.jar:?]
        at scala.collection.TraversableLike.map(TraversableLike.scala:286) ~[DataAnalyticsReporting.jar:?]
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279) ~[DataAnalyticsReporting.jar:?]
        at scala.collection.AbstractTraversable.map(Traversable.scala:108) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.downloadResourcesToCurrentDirectory$1(SparkSubmit.scala:429) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$16(SparkSubmit.scala:450) ~[DataAnalyticsReporting.jar:?]
        at scala.Option.map(Option.scala:230) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:450) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129) ~[DataAnalyticsReporting.jar:?]
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[DataAnalyticsReporting.jar:?]
Exception in thread "main" java.io.IOException: Failed to delete: /opt/spark-jars/spark-3-rules.yaml
        at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:146)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:117)
        at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:90)
        at org.apache.spark.util.SparkFileUtils.deleteRecursively(SparkFileUtils.scala:121)
        at org.apache.spark.util.SparkFileUtils.deleteRecursively$(SparkFileUtils.scala:120)
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1126)
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$14(SparkSubmit.scala:437)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at org.apache.spark.deploy.SparkSubmit.downloadResourcesToCurrentDirectory$1(SparkSubmit.scala:429)
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$16(SparkSubmit.scala:450)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:450)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)