jenkins-x / jx

Jenkins X provides automated CI+CD for Kubernetes with Preview Environments on Pull Requests using Cloud Native pipelines from Tekton
https://jenkins-x.io/
Apache License 2.0
4.58k stars 787 forks source link

jx-git-operator not being built #7937

Open thevinter opened 3 years ago

thevinter commented 3 years ago

I used terraform to build a kubernetes cluster on Azure following the official guide, after some troubleshooting I managed to get it up to the last step where the .tf is supposed to create a jx-git-operator. In the Terraform script it hangs for 5 minutes before giving the following error

╷ │ Warning: Helm release "jx-git-operator" was created but has a failed status. Use thehelmcommand to investigate the error, correct it, then run Terraform again. │ │ with module.cluster.module.jx-boot.helm_release.jx-git-operator, │ on .terraform/modules/cluster/terraform-jx-boot/boot.tf line 1, in resource "helm_release" "jx-git-operator": │ 1: resource "helm_release" "jx-git-operator" { │ ╵ ╷ │ Error: timed out waiting for the condition │ │ with module.cluster.module.jx-boot.helm_release.jx-git-operator, │ on .terraform/modules/cluster/terraform-jx-boot/boot.tf line 1, in resource "helm_release" "jx-git-operator": │ 1: resource "helm_release" "jx-git-operator" { │ ╵

I also tried running jx admin operator manually but it gives me the following error

Release "jxgo" does not exist. Installing it now. Error: rendered manifests contain a resource that already exists. Unable to continue with install: ServiceAccount "jx-git-operator" in namespace "jx-git-operator" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "jxgo": current value is "jx-git-operator"

And by checking at the pods in said namespace I can see that the jx-git-operator has an ImagePullBackOff error

❯ kubectl get pods --namespace jx-git-operator NAME READY STATUS RESTARTS AGE jx-git-operator-d94b4c87d-dqmpk 0/1 ImagePullBackOff 0 48m

shanedownes-dmi commented 3 years ago

This may be related to registry permissions. One of our builds started around 5:20 PM EDT, and hung on the first step. The K8s events shows that it got a 403 when attempting to pull the builder image. I was able to reproduce when attempting to pull any jx builder image locally.

K8s event:

jx                  21m         Warning   Failed                   pod/dmi-fr-dmr-web-pr-1023-pr-build-qpjn2-1-preview-build-c5r-l8k6d   Failed to pull image "gcr.io/jenkinsxio/builder-nodejs14x": rpc error: code = Unknown desc = failed to pull and unpack image "gcr.io/jenkinsxio/builder-nodejs14x:latest": failed to resolve reference "gcr.io/jenkinsxio/builder-nodejs14x:latest": unexpected status code [manifests latest]: 403 Forbidden

local docker pull outupt:

docker pull gcr.io/jenkinsxio/builder-nodejs14x
Using default tag: latest
Error response from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication

I suspect that the GCR registry is no longer public. Our last successful build was around 12:30 pm EDT, so this must have happened since then.

thevinter commented 3 years ago

I solved this temporarily by using an old image in the deployment

kubectl set image deployment/jx-git-operator jx-git-operator=gcr.io/jenkinsxio/jx-git-operator:0.0.173 --record -n jx-git-operator