jhutar commented 5 months ago

Expected Behavior

I would expect after some time Chains under the load memory consumption would become constant - it will start freeing memory.

Is this expected, or is this some sort of memory leak?

Actual Behavior

This is a memory graph of a Chains signing 10k very simple TaskRuns with that are just printing "hello world" (Pipeline, PipelineRun)

CHains was started around 15:30 and started signing PRs and TRs:

Chains was configured with this:

kubectl patch TektonConfig/config \
            --type merge \
            -p '{"spec":{"chain":{"artifacts.pipelinerun.format": "slsa/v1"}}}'
kubectl patch TektonConfig/config \
            --type merge \
            -p '{"spec":{"chain":{"artifacts.pipelinerun.storage": "tekton"}}}'
kubectl patch TektonConfig/config \
            --type='merge' \
            -p='{"spec":{"chain":{"artifacts.taskrun.format": "slsa/v1"}}}'
kubectl patch TektonConfig/config \
            --type='merge' \
            -p='{"spec":{"chain":{"artifacts.taskrun.storage": "tekton"}}}'

Steps to Reproduce the Problem

Run 10k PipelineRuns and wait for all of them to finish
Then start the Chains and let it sign PRs and TRs
This was automated in this repo with signing-tr-tekton-bigbang scenario

Additional Info

Kubernetes version:

Cluster is gone already, but it was ROSA OpenShift 4.14.11 with 5 compute nodes AWS EC2 m6a.2xlarge

Tekton Pipeline version:
chains-info: v0.19.0
pipelines-as-code-info: v0.22.4
pipelines-info: v0.53.2
triggers-info: v0.25.3
openshift-pipelines-operator-cluster-operations: v0.69.0
openshift-pipelines-operator-lifecycle: 1.13.0

Reported this together with https://github.com/tektoncd/pipeline/issues/7691

concaf commented 5 months ago

@jhutar what's the baseline memory usage of chains in this case when there are no workloads on the cluster?

wlynch commented 5 months ago

https://github.com/tektoncd/plumbing/pull/1840 to enable profiling on our dogfooding instance to help debug this, but if you want to enable this on your cluster and share the pprof output, that might be faster 🙏

Steps:

Add profiling.enable: "true" totekton-chains/tekton-chains-config-observability
kubectl port-forward -n tekton-chains tekton-chains-controller-794dcd9b65-k9f8d 8008 (replace pod name with your own)
wget localhost:8008/debug/pprof/heap

jhutar commented 3 months ago

Oh, thank you Billy! Will try to get a pprof output!

Maybe this is similar to https://github.com/tektoncd/pipeline/issues/7691 : memory used by informer to cache everything.

tektoncd / chains

Chains under the load memory consumption raises #1058

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info