Open guillaumerose opened 2 years ago
One of the comment in the config-defaults
for this is:
# default-managed-by-label-value contains the default value given to the
# "app.kubernetes.io/managed-by" label applied to all Pods created for
# TaskRuns. If a user's requested TaskRun specifies another value for this
# label, the user's request supercedes.
This is most likely why we are looking at all app.kubernetes.io/managed-by
pods, because users could overwrite it in there task/taskrun to integrate it with their tooling.
Allowing this or not can be discussed (and is a bit linked to #4366) but as it's currently allowed, I don't see a straightforward way to fix this π . At least we filter out all pods that do not use app.kubernetes.io/managed-by
labels.
/cc @imjasonh @bobcatfish @afrittoli @sbwsg
Can we limit the lister to the default label and fork the logic so that a TaskRun with its own managed-by label causes us to skip the lister and query the cluster directly?
Can we limit the lister to the default label and fork the logic so that a TaskRun with its own managed-by label causes us to skip the lister and query the cluster directly?
That might be problematic if pipeline is used "transparently" by some higher level tool. Assuming that I have a tool called foo that always setup this managed-by
label, this would make the controller query the cluster a lot and could become a problem. That said, I do think we may want to have / introduce a label for us that cannot be overriden for this particular purpose. Either we disallow overriding app.kubernetes.io/manage-by
or we come up with our "own" internal one ?
Can we have two labels, one for Tekton's internal usage, and one for whatever service is built on top of Tekton that might want to hide the Tekton internals (configured by configmap as today)? Then we could have the informer filter by the Tekton-internal label.
Maybe we could use app.kubernetes.io/created-by: tekton-pipeline-controller
? (see common labels)
Can we have two labels, one for Tekton's internal usage, and one for whatever service is built on top of Tekton that might want to hide the Tekton internals (configured by configmap as today)? Then we could have the informer filter by the Tekton-internal label.
Maybe we could use
app.kubernetes.io/created-by: tekton-pipeline-controller
? (see common labels)
Yes, that was what I was thinking too in my last comment. And I missed app.kubernetes.io/created-by
label somehow, but I think it's definitely a very good candidate for it. The only thing we need to make sure is to disallow (or prevent) this label to be overriden by the user (by adding it to the TaskRun
or PipelineRun
object), either silently or loudly πΌπΌ
That sounds to be a good idea. I am wondering how it works for the upgrade π€
If a pipeline is done before the upgrade, it's fine. If a pipeline is running, eg. pods don't have correct labels, after upgrade, the controller won't see them because of the new filter.
Am I correct? Do we need to roll this new label across multiples releases?
Am I correct? Do we need to roll this new label across multiples releases?
Good question, we will need to be careful here on how we handle this, but yes we'll have to think of backward compatibility.
/priority important-longterm
That sounds to be a good idea. I am wondering how it works for the upgrade π€
If a pipeline is done before the upgrade, it's fine. If a pipeline is running, eg. pods don't have correct labels, after upgrade, the controller won't see them because of the new filter.
Am I correct? Do we need to roll this new label across multiples releases?
This sounds like a problem for our operator to solve.
At present, the operator upgrades TektonPipeline, TektonTriggers etc immediately after an operator controller upgrade. The only check the operator has is version comparission (pipelines-info configMap) of Tektoncd Pipeline (on-cluster vs new release).
It could be straight forward to add a mechanism to delay (requeue) the upgrade if there are active taskRuns. This could be a decent starting point. We could improve this by adding a timeout for this requeue or by implementing a notion of 'maintenance window'.
/help cancel
Removing help wanted label because it sounds like it needs design input before it can more forward?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
/lifecycle frozen
Expected Behavior
The informer should have only pods with label
app.kubernetes.io/managed-by=tekton-pipelines
or the value of the configdefault-managed-by-label-value
https://github.com/tektoncd/pipeline/blob/main/config/config-defaults.yaml#L52
Actual Behavior
The informer has too many pods in his memory.
Steps to Reproduce the Problem
app.kubernetes.io/managed-by: not-tekton-at-all
Additional Info
The filteredinformerfactory is configured here: https://github.com/tektoncd/pipeline/blob/64aff5edcbb7e6fdf22e1270f60ee37085a8d934/cmd/controller/main.go#L101 The podinformer is configured here: https://github.com/tektoncd/pipeline/blob/64aff5edcbb7e6fdf22e1270f60ee37085a8d934/pkg/reconciler/taskrun/controller.go#L53