rancher / rio

Application Deployment Engine for Kubernetes
https://rio.io
Apache License 2.0
2.27k stars 228 forks source link

Tekton issues when upgrading from 0.7 to 0.8 #1065

Open yuregir opened 4 years ago

yuregir commented 4 years ago

Describe the bug

I am writing this for guide other people having similar issue.

This happened when I upgraded from 0.70-rc2 to 0.8,

After the update, (downloading new binary and doing rio install) First problem I had with tekton was because it cannot bind configmap (named config-logging), tekton pods were failing to start.

The error in the logs are

Internal error occurred: failed calling webhook "config.webhook.pipeline.tekton.dev": Post https://tekton-pipelines-webhook.tekton-pipelines.svc:443/config-validation?timeout=30s: no endpoints available for service "tekton-pipelines-webhook"

When I dig the logs and github issues, I found its cause;

This is due to a circular dependency. The tekton-pipelines-webhook pod can't start if it doesn't have the configmap, but the configmap can't be installed because it can't reach the tekton-pipelines-webhook pod for validation.

Link of the issue and fix on tekton repo here Fix

To summarize fix is deleting old webhook resources

kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io config.webhook.pipeline.tekton.dev
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io validation.webhook.pipeline.tekton.dev
kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io webhook.pipeline.tekton.dev

After this fix tekton pods started spawning, but I was getting new error and pods wont start.

New error in the logs is

OCI runtime create failed: container_linux.go:345: starting container process caused "chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied": unknown

After some search I found out the problem is caused by the security settings in the Pod. Issue Link Fix

Temporary solution: I had to change the Deployment for both the webhook and the controller to change runAsUser from 1001 -> 65532.

After this fix tekton started working properly. Why I am calling this temporary because, If I change anything in rio-config, rio-controller restarts, after this restart, runAsUser going back to 1001 and tekton stops working.

In current state, my local code on pc builds on rio and works, but my code at github repo isnt building.

rio ps output of github repo build:

not ready; BuildDeployed: failed to update dev/iot-dashboard-9d5c1-b47e9 tekton.dev/v1alpha1, Kind=TaskRun for service-build dev/iot-dashboard: admission webhook "webhook.pipeline.tekton.dev" denied the request: mutation failed: cannot decode incoming new object: json: unknown field "digest"(Error); iot-dashboard waiting on build

I wish to show you build-history logs but, rio build-history throwing fatal error

$ rio build-history

FATA[0000] template: :1:44: executing "" at <findRevision>: error calling findRevision: runtime error: invalid memory address or nil pointer dereference

I am not able to fix this, please help me to find a solution.

(Tried uninstalling/reinstalling rio several times but no luck, then I gave up and rollback to 0.7.1, everything working properly in 0.7)

Expected behavior

Clean upgrade from 0.7 to 0.8

Kubernetes version & type (GKE, on-prem): kubectl version

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Rio version: rio info

Rio Version: v0.8.0 (af7ad687)
Rio CLI Version: v0.8.0 (af7ad687)
Cluster Domain: service.metacore.io
Cluster Domain IPs:
System Namespace: rio-system
Wildcard certificates: service.metacore.io(true)