tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.52k stars 1.78k forks source link

[v0.44.x] conversion can fail if the `PipelineRun` is too big with `embedded-status` set to `both` (or `full`) #6561

Closed vdemeester closed 5 months ago

vdemeester commented 1 year ago

Expected Behavior

Using pipeline 0.44 and embedded-status set to both or full, it should be possible to list PipelineRun with v1 and not runing into issues.

Actual Behavior

Depending on the size of the PipelineRun (and mainly taskruns.status), it can/will fail.

tkn pr ls                                                                                              
Failed to list objects from pipelines-ci namespace 
Error: failed to list PipelineRuns from namespace pipelines-ci: conversion webhook for tekton.dev/v1beta1, Kind=PipelineRun returned invalid metadata: metadata.annotation: Too long: must have at most 262144 bytes

This shows it fail using tkn (which uses v1 API in that case) but it would fail using kubectl get as well, as the problem is happening in the conversion webhook.

Steps to Reproduce the Problem

  1. Install v0.44.x and set embedded-status to both (or full)
  2. Create a big PipelineRun and wait for it to finish
  3. kubectl get pipelinerun my-pipelinerun to see it fail
  4. kubectl get pipelinerun.v1beta1.tekton.dev my-pipelinerun would succeed

Additional Info

Any that runs Pipeline 0.44.x

v0.44.x

Reasons and possible solutions

This happens because we are using annotations to convert from v1beta1 to v1 for fields that do not exists. We can run into a case where the size of the field "marshalled" in that annotation will be too big to be "stored" in an annotation.

Possible fix: compress and encode in base64 the marshalled field in the annotations to make them take less places. It should be "fine" to not support the current way as, the source-of-truth (aka the store version) is v1beta1 and still has the field.

/cc @JeromeJu @lbernick

pritidesai commented 1 year ago

/assign @vdemeester /assign @JeromeJu

concaf commented 1 year ago

/assign @concaf spoke with @vdemeester offline to collaborate on this 😇

concaf commented 1 year ago

👋🏼 something to keep in mind is that https://github.com/tektoncd/pipeline/pull/6968 mitigates this issue to an extent, however due to annotations size limit (all annotations keys and values added together) of 256 kB, we cannot guarantee that compression will fix this entirely.

how do we want to move forward in this case? should we explore storing this somewhere else other than annotations?

vdemeester commented 1 year ago

cc @tektoncd/core-maintainers

tekton-robot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

vdemeester commented 5 months ago

I am going to go ahead and close this. 0.44.x is an old version, and the issue discussed shouldn't arise anymore.