kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.74k stars 1.36k forks source link

[BUG] Failed to pull image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1 #1991

Open vzhao12 opened 4 months ago

vzhao12 commented 4 months ago

Description

Unable to Start spark job in kubenetes

Reproduction Code [Required]

Steps to reproduce the behavior:

  1. Set up a new kubenetes cluster. I set up one in gcloud.
  2. Get kubenetes cluster config
  3. helm repo add spark-operator https://kubeflow.github.io/spark-operator
  4. helm install spark-operator spark-operator/spark-operator \ --namespace default \ --set 'image.tag=v1beta2-1.3.3-3.1.1' \ --set sparkJobNamespace=default

Expected behavior

Spin up the spark operator pod.

Actual behavior

Pod failed because of ImagePullBackOff

Saw the following error.

Failed to pull image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": rpc error: code = NotFound desc = failed to pull and unpack image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": failed to resolve reference "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1: not found

The errors start at 04/13/2024 1:00 AM

Terminal Output Screenshot(s)

Screenshot 2024-04-17 at 3 14 30 PM Screenshot 2024-04-17 at 3 14 38 PM

Environment & Versions

Additional context

vzhao12 commented 4 months ago

I checked https://github.com/kubeflow/spark-operator/pkgs/container/spark-operator It looks like we didn't publish version v1beta2-1.3.3-3.1.1 at all.

@yuchaoran2011 Can you push this version to fix the issue? Thanks

vzhao12 commented 4 months ago

Root cause is https://github.com/kubeflow/spark-operator/pull/1937

bharathk005 commented 4 months ago

/kind bug

zevisert commented 4 months ago

@vzhao12 Until this is addressed, you can use images from the old registry by invoking helm with an extra option

--set 'image.repository=ghcr.io/googlecloudplatform/spark-operator'
JunseoChoJJ commented 4 months ago

@vzhao12 I am still getting imagepullbackoff error. does anyone have idea? helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set 'image.repository=ghcr.io/googlecloudplatform/spark-operator' I am using this command

iva3682 commented 4 months ago

use 'image.repository=ghcr.io/kubeflow/spark-operator' and 'image.tag=v1beta2-1.4.3-3.5.0'

vara-bonthu commented 4 months ago

We just released a new image update with important registry fixes. Check it out:

Image tag: https://github.com/kubeflow/spark-operator/tree/v1beta2-1.4.5-3.5.0 Helm chart: https://github.com/kubeflow/spark-operator/releases/tag/spark-operator-chart-1.2.14

Please give it a try and let us know if you encounter any issues. We're working on a new KubeFlow Spark Operator release and your testing will help make it stable! Feel free to share feedback on the Kubeflow Spark operator channel.

zevisert commented 4 months ago

@vara-bonthu Users will still need to --set=image.repository=... if they are using any tag other than v1beta2-1.4.5-3.5.0 since previous docker images have not yet been replicated to the chart's default repository (docker.io/kubeflow/spark-operator).

Still only one tag exists in the default container registry: https://hub.docker.com/r/kubeflow/spark-operator/tags

Edit: Changed tag to match @RyanZotti's comment

RyanZotti commented 4 months ago

I think you meant any tag other than v1beta2-1.4.5-3.5.0. The 1.4.3 version isn't available but 1.4.5 is.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. Thank you for your contributions.