tektoncd / results

Long term storage of execution results.
Apache License 2.0
78 stars 74 forks source link

GRPC Stream Leak on Error #624

Closed stuartwdouglas closed 1 year ago

stuartwdouglas commented 1 year ago

Within RHTAP there have been multiple occasions where the tekton results watcher has hung and stopped processing results. In each case the last message in the log was a message similar to:

{"level":"error","ts":"2023-09-27T11:48:52.977Z","logger":"watcher","caller":"dynamic/dynamic.go:326","msg":"Error streaming log","knative.dev/traceid":"53b3d4b0-823c-4aec-a6e7-5d51f4ff8d77","knative.dev/key":"build-templates-e2e/verify-enterprise-contract-run-bgn9f","results.tekton.dev/kind":"PipelineRun","namespace":"build-templates-e2e","kind":"PipelineRun","name":"verify-enterprise-contract-run-bgn9f","error":"error reading from tkn reader: pipelineruns.tekton.dev \"verify-enterprise-contract-run-bgn9f\" not found","stacktrace":"github.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).sendLog.func1\n\t/opt/app-root/src/pkg/watcher/reconciler/dynamic/dynamic.go:326"}

And then nothing.

I had a look at the relevant code and it appears that in this case a GRPC stream is opened to stream the logs, however in the error return case it is never closed. My hypothesis is that after this happens a number of times we exhaust the number of available GRPC connections/streams and the watcher hangs.