tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.46k stars 1.77k forks source link

Pipelines with finally clause in them makes tekton-pipelines-webhook spam conversion errors between v1alpha1 and v1beta1 #3206

Closed benke33 closed 3 years ago

benke33 commented 4 years ago

Expected Behavior

Pipelines with finally: in them should not make the tekton-piplines-webhook spam convert errors.

Actual Behavior

When applying any Pipeline that includes finally: in it, the tekton-pipelines-webhook starts spamming these conversion errors:

{
  "level": "info",
  "logger": "webhook",
  "caller": "webhook/conversion.go:42",
  "msg": "Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc000714280), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"2019\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc00084d7c0), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:2019, TransferEncoding:[]string(nil), Close:false, Host:\"tekton-pipelines-webhook.tekton-pipelines.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"100.109.0.3:60412\", RequestURI:\"/resource-conversion?timeout=30s\", TLS:(*tls.ConnectionState)(0xc000788370), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc00084d800)}",
  "commit": "a162a1d"
}
{
  "level": "info",
  "logger": "webhook",
  "caller": "conversion/conversion.go:133",
  "msg": "Converting [kind=Pipeline group=tekton.dev version=v1beta1] to version tekton.dev/v1alpha1",
  "commit": "a162a1d",
  "uid": "b2b3d9a8-e22b-4724-8105-61ccabaa9bf5",
  "desiredAPIVersion": "tekton.dev/v1alpha1",
  "inputType": "[kind=Pipeline group=tekton.dev version=v1beta1]",
  "outputType": "[kind=Pipeline group=tekton.dev version=v1alpha1]",
  "hubType": "[kind=Pipeline group=tekton.dev version=v1alpha1]",
  "knative.dev/key": "tekton-pipelines/clone-cleanup-workspace"
}
{
  "level": "error",
  "logger": "webhook",
  "caller": "conversion/conversion.go:59",
  "msg": "Conversion failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=tekton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1",
  "commit": "a162a1d",
  "uid": "b2b3d9a8-e22b-4724-8105-61ccabaa9bf5",
  "desiredAPIVersion": "tekton.dev/v1alpha1",
  "stacktrace": "github.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).Convert\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/resourcesemantics/conversion/conversion.go:59\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook.conversionHandler.func1\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/conversion.go:61\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2012\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2387\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/webhook.go:259\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2807\nnet/http.(*conn).serve\n\tnet/http/server.go:1895"
}

Steps to Reproduce the Problem

  1. Create a new minikube cluster
  2. Apply the latest tekton-pipelines release: 0.16.0
  3. Apply any Pipeline with a finally:clause in it. I used the example you have available here: tekton-pipelinerun-with-final-task
  4. Check the logs of tekton-pipelines-webhook pod and observe the errors above, spamming forever.
  5. Remove the YAML applied in step 3 and the log spamming stops.

Additional Info

Have also tested this on our AKS cluster in Azure, same results as in minikube. Versions tested that showed same behavior: 0.15.0, 0.15.1, 0.15.2 and 0.16.0

$ k version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:23:04Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
$ tkn version
Client version: 0.11.0
Pipeline version: v0.16.0
Triggers version: unknown
vdemeester commented 4 years ago

/cc @pritidesai This is really weird, I don't get why having finally would make something hit the webhook (conversion or not).

coryrc commented 4 years ago

As stated in #3213 this also occurs in 0.15.2

I have only ever used kubectl create -f pr.yaml to create my PRs.

coryrc commented 4 years ago

Also, I am getting this message from PipelineRuns that do not have finally in them, but cannot be sure it didn't happen before I created my first PR with a finally:.

pritidesai commented 4 years ago

Yup, I see these conversion errors for a pipeline with finally in webhook logs 😞

@vdemeester any beta pipeline (with or without finally) hitting conversion, is this expected?:

{"level":"info","logger":"webhook","caller":"webhook/conversion.go:42","msg":"Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc000d30500), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"1228\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc000140e00), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:1228, TransferEncoding:[]string(nil), Close:false, Host:\"tekton-pipelines-webhook.tekton-pipelines.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"192.168.65.3:55210\", RequestURI:\"/resource-conversion?timeout=30s\", TLS:(*tls.ConnectionState)(0xc0005fac60), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc000140e40)}","commit":"e356ca5"}

{"level":"info","logger":"webhook","caller":"conversion/conversion.go:133","msg":"Converting [kind=PipelineRun group=tekton.dev version=v1beta1] to version tekton.dev/v1alpha1","commit":"e356ca5","uid":"e9a2c7aa-c16d-4c17-b7aa-2baf13a8f922","desiredAPIVersion":"tekton.dev/v1alpha1","inputType":"[kind=PipelineRun group=tekton.dev version=v1beta1]","outputType":"[kind=PipelineRun group=tekton.dev version=v1alpha1]","hubType":"[kind=PipelineRun group=tekton.dev version=v1alpha1]","knative.dev/key":"default/pipelinerun-issue-3206"}
pritidesai commented 4 years ago

and we would see these conversion errors with finally since its not available in alpha and conversion would fail.

coryrc commented 4 years ago

(I confirmed that I did not see any of these errors before the first time I ran a Pipeline with finally: but that it now throws that error on Pipelines that did not have finally:)

vdemeester commented 4 years ago

Yup, I see these conversion errors for a pipeline with finally in webhook logs disappointed

@vdemeester any beta pipeline (with or without finally) hitting conversion, is this expected?:

{"level":"info","logger":"webhook","caller":"webhook/conversion.go:42","msg":"Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc000d30500), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"1228\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc000140e00), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:1228, TransferEncoding:[]string(nil), Close:false, Host:\"tekton-pipelines-webhook.tekton-pipelines.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"192.168.65.3:55210\", RequestURI:\"/resource-conversion?timeout=30s\", TLS:(*tls.ConnectionState)(0xc0005fac60), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc000140e40)}","commit":"e356ca5"}

{"level":"info","logger":"webhook","caller":"conversion/conversion.go:133","msg":"Converting [kind=PipelineRun group=tekton.dev version=v1beta1] to version tekton.dev/v1alpha1","commit":"e356ca5","uid":"e9a2c7aa-c16d-4c17-b7aa-2baf13a8f922","desiredAPIVersion":"tekton.dev/v1alpha1","inputType":"[kind=PipelineRun group=tekton.dev version=v1beta1]","outputType":"[kind=PipelineRun group=tekton.dev version=v1alpha1]","hubType":"[kind=PipelineRun group=tekton.dev version=v1alpha1]","knative.dev/key":"default/pipelinerun-issue-3206"}

It is not excepted nope

pritidesai commented 4 years ago

We have the conversionController added as part of webhook configuration:

https://github.com/tektoncd/pipeline/blob/c90461fe79ce7b7e4693eb1020029157959cdbe8/cmd/webhook/main.go#L216-L222

any changes needed here? I am trying to figure out how is the conversion initiated 🤔

pritidesai commented 4 years ago

and the desiredAPIVersion is set to alpha:

"desiredAPIVersion":"tekton.dev/v1alpha1"
pritidesai commented 4 years ago

Last time we changed knative/pkg was on July 22: https://github.com/tektoncd/pipeline/commit/84ddab4129190f13bfd4b2d41780fe421de61eac#diff-212ccdd03608b1dea47236a14810bcb2

vdemeester commented 4 years ago

@pritidesai right, the conversion happens if a client asks for a v1alpha1 object (and the storage is v1beta1) through the API. So it's either an external consumer of the API (that watches the object or polls them) that uses v1alpha1 or it's something internal.

I re-read the issue, and now that I think of it, there might be an external consumer of the API. @benke33 @coryrc can you give us a little bit more details of what is installed on your cluster ? Anything that would query the tekton API (like prow, maybe some UI) ? What version of kubernetes is it ? Is it OpenShift ?

coryrc commented 4 years ago

Prow is not installed on my cluster, but the cluster is used as a "build cluster" in which Prow will apply pods. @benke33 says this repos in brand-new minikube though.

I'm otherwise just running tkn logs occasionally.

vdemeester commented 4 years ago

@coryrc so prow "might" query the tekton api of this cluster ? Yeah, if it's on a brand new minikube without any api customer (no UI, …) then it might be an internal problem (something from the controller using v1alpha1 calls still..)

coryrc commented 4 years ago

@coryrc so prow "might" query the tekton api of this cluster ?

Yes, unfortunately I can't rule that out because I am not sufficiently knowledgeable in this domain.

Yeah, if it's on a brand new minikube without any api customer (no UI, …) then it might be an internal problem (something from the controller using v1alpha1 calls still..)

Sometimes I think back to 5-pages of Java stacktrace fondly...

benke33 commented 4 years ago

@vdemeester there's nothing apart from tekton pipelines 0.16.0 installed in the cluster and it's a brand new minikube 1.13.0 running kubernetes 1.19.0. Then I just applied this: pipeline-with-finally

ljupchokotev commented 4 years ago

I am getting these errors for pipelines with finally sections on a clean GKE 1.16 cluster. The moment I apply the pipeline, it will start spamming these errors. On its own, it's not a problem apart from the log spam. But, I am using Fluxcd for syncing Tekton manifests and somehow Flux figures out these errors and stops the syncing process. So it is impossible to use Fluxcd currently to sync Tekton manifests. Note that the errors start before Flux is installed.

Flux logs:

ts=2020-10-06T15:41:45.979439989Z caller=loop.go:108 component=sync-loop err="collating resources in cluster for sync: conversion webhook for tekton.dev/v1beta1, Kind=Pipeline failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=tekton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1"
vdemeester commented 4 years ago

@coryrc @benke33 @ljupchokotev can you paste the output of the following command : kubectl get crd pipelines.tekton.dev -o yaml I wonder if this happen when there is still some pipeline/tekton object stored as v1alpha1 in the cluster.

# On new cluster
  storedVersions:
  - v1beta1
# On old cluster (that updated tekton from version to version)
  storedVersions:
  - v1alpha1
  - v1beta1
ljupchokotev commented 4 years ago

@vdemeester

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apiextensions.k8s.io/v1","kind":"CustomResourceDefinition","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"default","app.kubernetes.io/part-of":"tekton-pipelines","pipeline.tekton.dev/release":"v0.17.0","version":"v0.17.0"},"name":"pipelines.tekton.dev"},"spec":{"conversion":{"strategy":"Webhook","webhook":{"clientConfig":{"service":{"name":"tekton-pipelines-webhook","namespace":"tekton-pipelines"}},"conversionReviewVersions":["v1beta1"]}},"group":"tekton.dev","names":{"categories":["tekton","tekton-pipelines"],"kind":"Pipeline","plural":"pipelines"},"preserveUnknownFields":false,"scope":"Namespaced","versions":[{"name":"v1alpha1","schema":{"openAPIV3Schema":{"type":"object","x-kubernetes-preserve-unknown-fields":true}},"served":true,"storage":false,"subresources":{"status":{}}},{"name":"v1beta1","schema":{"openAPIV3Schema":{"type":"object","x-kubernetes-preserve-unknown-fields":true}},"served":true,"storage":true,"subresources":{"status":{}}}]}}
  creationTimestamp: "2020-09-18T14:12:38Z"
  generation: 2
  labels:
    app.kubernetes.io/instance: default
    app.kubernetes.io/part-of: tekton-pipelines
    pipeline.tekton.dev/release: v0.17.0
    version: v0.17.0
  name: pipelines.tekton.dev
  resourceVersion: "20442026"
  selfLink: /apis/apiextensions.k8s.io/v1/customresourcedefinitions/pipelines.tekton.dev
  uid: 8d39e524-5d5f-40c9-91af-c371b6276916
spec:
  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        caBundle: <caBundle>
        service:
          name: tekton-pipelines-webhook
          namespace: tekton-pipelines
          path: /resource-conversion
          port: 443
      conversionReviewVersions:
      - v1beta1
  group: tekton.dev
  names:
    categories:
    - tekton
    - tekton-pipelines
    kind: Pipeline
    listKind: PipelineList
    plural: pipelines
    singular: pipeline
  scope: Namespaced
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        type: object
        x-kubernetes-preserve-unknown-fields: true
    served: true
    storage: false
    subresources:
      status: {}
  - name: v1beta1
    schema:
      openAPIV3Schema:
        type: object
        x-kubernetes-preserve-unknown-fields: true
    served: true
    storage: true
    subresources:
      status: {}
status:
  acceptedNames:
    categories:
    - tekton
    - tekton-pipelines
    kind: Pipeline
    listKind: PipelineList
    plural: pipelines
    singular: pipeline
  conditions:
  - lastTransitionTime: "2020-09-18T14:12:38Z"
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: "2020-09-18T14:12:38Z"
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  storedVersions:
  - v1beta1
vdemeester commented 4 years ago

@ljupchokotev thanks, this means my initial though about this was wrong.. back to square one :sweat_smile:

ljupchokotev commented 4 years ago

Looking a bit closer at the webhook logs, it's doing this conversion only for 2 out of 4 pipelines. One of those conversions is succeeding since that pipeline doesn't have any finally section, while it fails for the other one. The other 2 remaining pipelines don't show up in the log spam. 1 of them even uses a finally section.

vdemeester commented 4 years ago

Looking a bit closer at the webhook logs, it's doing this conversion only for 2 out of 4 pipelines. One of those conversions is succeeding since that pipeline doesn't have any finally section, while it fails for the other one. The other 2 remaining pipelines don't show up in the log spam. 1 of them even uses a finally section.

Hum that's even more concerning… :cold_sweat: There should be no reason for them to be converted and thus hitting this issue… in theory.. :sweat:

pritidesai commented 4 years ago

@vdemeester does it play any role here, how the pipelines are applied or created? kubectl, ko, tkn?

Fabian-K commented 4 years ago

Hi, I stumbled upon what I think looks like a similar issue. I´m

This results in a conversion of the task to v1alpha1. Logs:

{"level":"info","logger":"webhook","caller":"webhook/admission.go:72","msg":"Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc0007df680), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"1471\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc0007c5bc0), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:1471, TransferEncoding:[]string(nil), Close:false, Host:\"tekton-pipelines-webhook.tekton-pipelines.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"10.244.0.1:42184\", RequestURI:\"/defaulting?timeout=30s\", TLS:(*tls.ConnectionState)(0xc000ee5ce0), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc0007c5c00)}","commit":"9d4d495"}
{"level":"info","logger":"webhook","caller":"defaulting/defaulting.go:125","msg":"Kind: \"tekton.dev/v1beta1, Kind=Task\" PatchBytes: [{\"op\":\"add\",\"path\":\"/spec/steps/0/resources\",\"value\":{}}]","commit":"9d4d495","knative.dev/kind":"tekton.dev/v1beta1, Kind=Task","knative.dev/namespace":"default","knative.dev/name":"dummy","knative.dev/operation":"CREATE","knative.dev/resource":"tekton.dev/v1beta1, Resource=tasks","knative.dev/subresource":"","knative.dev/userinfo":"{kubernetes-admin  [system:masters system:authenticated] map[]}"}
{"level":"info","logger":"webhook","caller":"webhook/admission.go:104","msg":"AdmissionReview for v1.GroupVersionKind{Group:\"tekton.dev\", Version:\"v1beta1\", Kind:\"Task\"}: default/dummy response={ UID: \"\", Allowed: true, Status: (*v1.Status)(nil), Patch: [{\"op\":\"add\",\"path\":\"/spec/steps/0/resources\",\"value\":{}}], PatchType: JSONPatch, AuditAnnotations: map[string]string(nil)}","commit":"9d4d495","knative.dev/kind":"tekton.dev/v1beta1, Kind=Task","knative.dev/namespace":"default","knative.dev/name":"dummy","knative.dev/operation":"CREATE","knative.dev/resource":"tekton.dev/v1beta1, Resource=tasks","knative.dev/subresource":"","knative.dev/userinfo":"{kubernetes-admin  [system:masters system:authenticated] map[]}"}
{"level":"info","logger":"webhook","caller":"webhook/admission.go:72","msg":"Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc0008d1d80), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"1564\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc000deb340), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:1564, TransferEncoding:[]string(nil), Close:false, Host:\"tekton-pipelines-webhook.tekton-pipelines.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"10.244.0.1:42184\", RequestURI:\"/resource-validation?timeout=30s\", TLS:(*tls.ConnectionState)(0xc000ee5ce0), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc000deb380)}","commit":"9d4d495"}
{"level":"info","logger":"webhook","caller":"webhook/admission.go:104","msg":"AdmissionReview for v1.GroupVersionKind{Group:\"tekton.dev\", Version:\"v1beta1\", Kind:\"Task\"}: default/dummy response={ UID: \"\", Allowed: true, Status: (*v1.Status)(nil), Patch: , PatchType: , AuditAnnotations: map[string]string(nil)}","commit":"9d4d495","knative.dev/kind":"tekton.dev/v1beta1, Kind=Task","knative.dev/namespace":"default","knative.dev/name":"dummy","knative.dev/operation":"CREATE","knative.dev/resource":"tekton.dev/v1beta1, Resource=tasks","knative.dev/subresource":"","knative.dev/userinfo":"{kubernetes-admin  [system:masters system:authenticated] map[]}"}
{"level":"info","logger":"webhook","caller":"webhook/conversion.go:42","msg":"Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc0007df900), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"1105\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc0009ba7c0), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:1105, TransferEncoding:[]string(nil), Close:false, Host:\"tekton-pipelines-webhook.tekton-pipelines.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"10.244.0.1:42184\", RequestURI:\"/resource-conversion?timeout=30s\", TLS:(*tls.ConnectionState)(0xc000ee5ce0), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc0009ba800)}","commit":"9d4d495"}
{"level":"info","logger":"webhook","caller":"conversion/conversion.go:133","msg":"Converting [kind=Task group=tekton.dev version=v1beta1] to version tekton.dev/v1alpha1","commit":"9d4d495","uid":"4c76a903-412c-4c9e-97c6-7854bf803b6b","desiredAPIVersion":"tekton.dev/v1alpha1","inputType":"[kind=Task group=tekton.dev version=v1beta1]","outputType":"[kind=Task group=tekton.dev version=v1alpha1]","hubType":"[kind=Task group=tekton.dev version=v1alpha1]","knative.dev/key":"default/dummy"}

🤔

freefood89 commented 3 years ago

I'm getting the same issue on GKE. I manage tekton pipelines in GKE using fluxcd to automate, and this issue is causing fluxcd to fail.

GKE Version: 1.17.9-gke.1504 Pipelines version: v0.17.3 FluxCD: 1.21.0 -- Kubectl: 1.18.9 (fluxctl), Kustomize 3.8.4

ts=2020-11-04T21:09:18.104517023Z caller=loop.go:108 component=sync-loop err="collating resources in cluster for sync: conversion webhook for tekton.dev/v1beta1, Kind=Pipeline failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=tekton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1"

However, I am able to manually create these pipelines from my laptop using kustomize and kubectl. These Pipelines have no problem running.

Local -- Kubectl: 1.19.2, Kustomize 3.8.4

Logs from trying to convert a pipeline with a finally task:

{
    "level": "info",
    "logger": "webhook",
    "caller": "conversion/conversion.go:133",
    "msg": "Converting [kind=Pipeline group=tekton.dev version=v1beta1] to version tekton.dev/v1alpha1",
    "commit": "20672ac",
    "uid": "da00aa75-402f-41eb-b80c-1646960bb54d",
    "desiredAPIVersion": "tekton.dev/v1alpha1",
    "inputType": "[kind=Pipeline group=tekton.dev version=v1beta1]",
    "outputType": "[kind=Pipeline group=tekton.dev version=v1alpha1]",
    "hubType": "[kind=Pipeline group=tekton.dev version=v1alpha1]",
    "knative.dev/key": "cai/crasample-test"
}
{
    "level": "error",
    "logger": "webhook",
    "caller": "conversion/conversion.go:59",
    "msg": "Conversion failed: conversion failed to version v1alpha1 for type [kind=Pipeline group=tekton.dev version=v1beta1] -  the specified field/section is not available in v1alpha1",
    "commit": "9d4d495",
    "uid": "324da320-182b-41fa-9c2e-ec516804a5b1",
    "desiredAPIVersion": "tekton.dev/v1alpha1",
    "stacktrace": "github.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).Convert\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/resourcesemantics/conversion/conversion.go:59\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook.conversionHandler.func1\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/conversion.go:61\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2012\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2387\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook/webhook.go:247\ngithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\tgithub.com/tektoncd/pipeline/vendor/knative.dev/pkg/network/handlers/drain.go:88\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2807\nnet/http.(*conn).serve\n\tnet/http/server.go:1895"
}
freefood89 commented 3 years ago

Deleting pipelines with finally made the errors go away and adding them back brought the errors back.

What could be explicitly requesting v1alpha1 versions of Pipeline and other resources? kubectl get pipelines.v1alpha1.tekton.dev seems to trigger successful conversions (level=info logs) for pipelines without finally.

Deleting the pipelines with finally also reduced the spam (level=info logs)

Screen Shot 2020-11-05 at 9 18 26 AM Screen Shot 2020-11-05 at 9 16 29 AM

freefood89 commented 3 years ago

Oops. Looks like I didn't add much that's new to the thread. Here are some things that I haven't seen in the thread yet..

pritidesai commented 3 years ago

thanks @freefood89 I am trying to troubleshoot this without much conversion/version knowledge 😞

Yes, the webhook logs has desiredAPIVersion set to v1alpha1 for all resources which I guess is causing issue, I am not 100% sure.

Pipeline CRDs have listed two versions, v1alpha1 and v1beta1, but v1alpha1 is served but storage is set to false and v1beta1 does not have any setting for served but storage is set to true 🤔

https://github.com/tektoncd/pipeline/blob/bfc654e500a39f9e3c2e119ebff9e2dc96010d0e/config/300-pipeline.yaml#L27-L49

@vdemeester any thing wrong with the versions here? v1beta1 needs serving set to true too?

freefood89 commented 3 years ago

@pritidesai I learned that actually v1beta1 in the CRD is whatever v1alpha1 is with:

name: v1beta1
storage: true 

as overrides. Apparently that's what &version with <<: *version does. Didn't know yaml had these capabilities

resulting in:

versions: 
- name: v1alpha1 
  served: true 
  storage: false 
  # Opt into the status subresource so metadata.generation 
  # starts to increment 
  subresources: 
    status: {} 
  schema: 
    openAPIV3Schema: 
      type: object 
      # One can use x-kubernetes-preserve-unknown-fields: true 
      # at the root of the schema (and inside any properties, additionalProperties) 
      # to get the traditional CRD behaviour that nothing is pruned, despite 
      # setting spec.preserveUnknownProperties: false. 
      # 
      # See https://kubernetes.io/blog/2019/06/20/crd-structural-schema/ 
      # See issue: https://github.com/knative/serving/issues/912 
      x-kubernetes-preserve-unknown-fields: true 
- name: v1beta1 
  storage: true 
  served: true 
  # Opt into the status subresource so metadata.generation 
  # starts to increment 
  subresources: 
    status: {} 
  schema: 
    openAPIV3Schema: 
      type: object 
      # One can use x-kubernetes-preserve-unknown-fields: true 
      # at the root of the schema (and inside any properties, additionalProperties) 
      # to get the traditional CRD behaviour that nothing is pruned, despite 
      # setting spec.preserveUnknownProperties: false. 
      # 
      # See https://kubernetes.io/blog/2019/06/20/crd-structural-schema/ 
      # See issue: https://github.com/knative/serving/issues/912 
      x-kubernetes-preserve-unknown-fields: true 
pritidesai commented 3 years ago

aah, crazy, but then is desiredAPIVersion playing any role here? Why is it set to v1alpha1 (according to webhook logs)? 🤔

freefood89 commented 3 years ago

So whenever you get resources from k8s the api is called with the resource:

GET /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE/NAME

I'm guessing that's where the desiredVersion comes from (???) I was able to replicate it with kubectl get pipelines.v1alpha1.tekton.dev where k8s will call the conversion webhook to convert my v1beta1 pipelines to v1alpha1

(I'm learning this as I go too haha)

freefood89 commented 3 years ago

I'm starting to suspect that maybe the conversion webhook service is calling itself seeing as errors are increasing calls to itself. I'll be running delve to step through the code after taking a break.. How are others debugging this?

If this doesn't turn up answers I have no idea what's causing this.

pritidesai commented 3 years ago

I think @afrittoli is using delve. I am literally updating CRDs/code and installing pipeline with every change to make sure nothing else mocking with it 😱

pritidesai commented 3 years ago

Even when I am creating a pipeline/task with v1beta1, the same pipeline is created for both versions:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: 02-pipeline
-> kubectl get pipelines.v1beta1.tekton.dev 02-pipeline
NAME          AGE
02-pipeline   12s

-> kubectl get pipelines.v1alpha1.tekton.dev 02-pipeline
NAME          AGE
02-pipeline   20s

v1alpha1 was created first, its age is 20 seconds whereas v1beta1 was created later, its age is 12 seconds. Same happens with tasks.

v1beta1 Pipeline converted to v1alpha1:

        "resourceVersion": "13367322",
        "selfLink": "/apis/tekton.dev/v1alpha1/namespaces/default/pipelines/02-pipeline",
        "uid": "96756430-456f-4a6a-9e0d-0590e7dd7e04"

v1beta1 version of that pipeline:

        "resourceVersion": "13367322",
        "selfLink": "/apis/tekton.dev/v1beta1/namespaces/default/pipelines/02-pipeline",
        "uid": "96756430-456f-4a6a-9e0d-0590e7dd7e04"
pritidesai commented 3 years ago

v1alpha1 pipeline is also getting converted to v1beta1 😱 and they both have same resourceVersion and uid just like v1beta1 getting converted to v1alpha1.

apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: 03-pipeline
...
-> kubectl get pipelines.v1beta1.tekton.dev 03-pipeline
NAME          AGE
03-pipeline   99s

 -> kubectl get pipelines.v1alpha1.tekton.dev 03-pipeline
NAME          AGE
03-pipeline   106s
freefood89 commented 3 years ago

It should only be stored as v1beta1. I think it's converting when you request it as v1alpha1 or v1beta1... right?

pritidesai commented 3 years ago

nope, I didnt explicitly request that particular pipeline, just listing v1alpha1 pipelines with kubectl get pipelines.v1alpha1.tekton.dev shows that beta pipeline converted into alpha as well. Unless, the listing command is implicitly converting pipelines which sounds weird.

Yup ideally should be stored as just v1beta1

vdemeester commented 3 years ago

nope, I didnt explicitly request that particular pipeline, just listing v1alpha1 pipelines with kubectl get pipelines.v1alpha1.tekton.dev shows that beta pipeline converted into alpha as well. Unless, the listing command is implicitly converting pipelines which sounds weird.

Yeah, that's what should happens, when you list pipelines.v1alpha1.tekton.dev, it will get whatever is stored and convert it "on-the-fly" in a v1alpha1 version if it can.

The "age" difference in the listing feels weird though…

freefood89 commented 3 years ago

I think the age difference is probably the time difference between the first and second kubectl get.

BTW, for now I've decided to just patch away v1alpha altogether from the CRD (I use kustomize). This way k8s will just prevent requests that trigger the conversion webhook mentioned above since v1alpha1 no longer exists. It's kind of brute force, but I don't intend to support v1alpha1 nor do we have legacy resources.

Luckily and also mysteriously the controller doesn't seem to request v1alpha1 as far as I saw :laughing:

It looks like I wont have much time in the near future to look into this more :disappointed:

vdemeester commented 3 years ago

@freefood89 :hugs: Yeah I need to dig deeper into this. I have yet to understand why the webhook or controller would go on his own to ask a v1alpha1 or a conversion… I might miss something obvious in code…

pritidesai commented 3 years ago

The age difference is really weird, it could be related to kubectl get but I am having hard time understanding since I am creating a pipeline in v1beta1 and listing v1beta1 before v1alpha1. Based on the age of 20 seconds, the pipeline was first created in v1alpha1 followed by v1beta1 (12 seconds old), unless the age is itself broken 😆

-> kubectl get pipelines.v1beta1.tekton.dev 02-pipeline
NAME          AGE
02-pipeline   12s

-> kubectl get pipelines.v1alpha1.tekton.dev 02-pipeline
NAME          AGE
02-pipeline   20s
vdemeester commented 3 years ago

@pritidesai because it's two commands, it make sense that those "age" are different. The would also be different if you ask twice the same resource version :wink:

freefood89 commented 3 years ago

Do you think it's reconciliation? I wrote my operator using kopf and there's an option to periodically make sure the resources are in the right state.

skaegi commented 3 years ago

We were working on a possibly related issue and noticed that the "run" CRD is alpha and wondered if it somehow might be involved in the finally case. Just a thought... without any evidence.... ;)

Yannig commented 3 years ago

Well, I don't know if you know it but the issue is still there with 0.18.1. I get the same kind of spam over and over until I remove the finally statement on my pipeline.

Yannig commented 3 years ago

In order to correct the problem, I made the modification suggested by @freefood89. If it helps anyone, it's a matter of making the following change in the tekton manifest:

diff --git a/tekton/original-tekton-v0.18.1.yaml b/tekton/original-tekton-v0.18.1.yaml
index 23b7e02..f1c6fd1 100644
--- a/tekton/original-tekton-v0.18.1.yaml
+++ b/tekton/original-tekton-v0.18.1.yaml
@@ -632,10 +632,9 @@ spec:
   group: tekton.dev
   preserveUnknownFields: false
   versions:
-    - &version
-      name: v1alpha1
+    - name: v1beta1
       served: true
-      storage: false
+      storage: true
       # Opt into the status subresource so metadata.generation
       # starts to increment
       subresources:
@@ -651,9 +650,6 @@ spec:
           # See https://kubernetes.io/blog/2019/06/20/crd-structural-schema/
           # See issue: https://github.com/knative/serving/issues/912
           x-kubernetes-preserve-unknown-fields: true
-    - !!merge <<: *version
-      name: v1beta1
-      storage: true
   names:
     kind: Pipeline
     plural: pipelines
pritidesai commented 3 years ago

thanks @Yannig appreciate the workaround, but wouldn't work for folks using v1alpha1 resources along with v1beta1.

@vdemeester any updates on this one? will this be magically fixed in 0.19? 🤞

vdemeester commented 3 years ago

@Yannig this would remove support for v1alpha1 though (as it wouldn't be served anymore…). I don't see any reason why it would be magically fixed in 0.19 :sweat:

Yannig commented 3 years ago

I did some experiments today and this modification is not compatible when you deploy the trigger (last version v0.10.1).

Now I think this is this component that try to convert every objects in v1alpha1 triggering this log activity.

vdemeester commented 3 years ago

I did some experiments today and this modification is not compatible when you deploy the trigger (last version v0.10.1).

Now I think this is this component that try to convert every objects in v1alpha1 triggering this log activity.

cc @dibyom @khrm is triggers still using v1alpha1 API ? :thinking:

dibyom commented 3 years ago

cc @dibyom @khrm is triggers still using v1alpha1 API ? 🤔

The pipelines v1alpha1 API? Users can create resources with that APIVersion but they don't have to. As far as I know, there shouldn't be a direct dependency on the v1alpha1 API (we use a dynamic client).