Open thevinter opened 3 years ago
This may be related to registry permissions. One of our builds started around 5:20 PM EDT, and hung on the first step. The K8s events shows that it got a 403 when attempting to pull the builder image. I was able to reproduce when attempting to pull any jx builder image locally.
K8s event:
jx 21m Warning Failed pod/dmi-fr-dmr-web-pr-1023-pr-build-qpjn2-1-preview-build-c5r-l8k6d Failed to pull image "gcr.io/jenkinsxio/builder-nodejs14x": rpc error: code = Unknown desc = failed to pull and unpack image "gcr.io/jenkinsxio/builder-nodejs14x:latest": failed to resolve reference "gcr.io/jenkinsxio/builder-nodejs14x:latest": unexpected status code [manifests latest]: 403 Forbidden
local docker pull outupt:
docker pull gcr.io/jenkinsxio/builder-nodejs14x
Using default tag: latest
Error response from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
I suspect that the GCR registry is no longer public. Our last successful build was around 12:30 pm EDT, so this must have happened since then.
I solved this temporarily by using an old image in the deployment
kubectl set image deployment/jx-git-operator jx-git-operator=gcr.io/jenkinsxio/jx-git-operator:0.0.173 --record -n jx-git-operator
I used terraform to build a kubernetes cluster on Azure following the official guide, after some troubleshooting I managed to get it up to the last step where the .tf is supposed to create a jx-git-operator. In the Terraform script it hangs for 5 minutes before giving the following error
╷ │ Warning: Helm release "jx-git-operator" was created but has a failed status. Use the
helmcommand to investigate the error, correct it, then run Terraform again. │ │ with module.cluster.module.jx-boot.helm_release.jx-git-operator, │ on .terraform/modules/cluster/terraform-jx-boot/boot.tf line 1, in resource "helm_release" "jx-git-operator": │ 1: resource "helm_release" "jx-git-operator" { │ ╵ ╷ │ Error: timed out waiting for the condition │ │ with module.cluster.module.jx-boot.helm_release.jx-git-operator, │ on .terraform/modules/cluster/terraform-jx-boot/boot.tf line 1, in resource "helm_release" "jx-git-operator": │ 1: resource "helm_release" "jx-git-operator" { │ ╵
I also tried running
jx admin operator
manually but it gives me the following errorRelease "jxgo" does not exist. Installing it now. Error: rendered manifests contain a resource that already exists. Unable to continue with install: ServiceAccount "jx-git-operator" in namespace "jx-git-operator" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "jxgo": current value is "jx-git-operator"
And by checking at the pods in said namespace I can see that the jx-git-operator has an ImagePullBackOff error
❯ kubectl get pods --namespace jx-git-operator NAME READY STATUS RESTARTS AGE jx-git-operator-d94b4c87d-dqmpk 0/1 ImagePullBackOff 0 48m