Within RHTAP there have been multiple occasions where the tekton results watcher has hung and stopped processing results. In each case the last message in the log was a message similar to:
{"level":"error","ts":"2023-09-27T11:48:52.977Z","logger":"watcher","caller":"dynamic/dynamic.go:326","msg":"Error streaming log","knative.dev/traceid":"53b3d4b0-823c-4aec-a6e7-5d51f4ff8d77","knative.dev/key":"build-templates-e2e/verify-enterprise-contract-run-bgn9f","results.tekton.dev/kind":"PipelineRun","namespace":"build-templates-e2e","kind":"PipelineRun","name":"verify-enterprise-contract-run-bgn9f","error":"error reading from tkn reader: pipelineruns.tekton.dev \"verify-enterprise-contract-run-bgn9f\" not found","stacktrace":"github.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).sendLog.func1\n\t/opt/app-root/src/pkg/watcher/reconciler/dynamic/dynamic.go:326"}
And then nothing.
I had a look at the relevant code and it appears that in this case a GRPC stream is opened to stream the logs, however in the error return case it is never closed. My hypothesis is that after this happens a number of times we exhaust the number of available GRPC connections/streams and the watcher hangs.
Within RHTAP there have been multiple occasions where the tekton results watcher has hung and stopped processing results. In each case the last message in the log was a message similar to:
{"level":"error","ts":"2023-09-27T11:48:52.977Z","logger":"watcher","caller":"dynamic/dynamic.go:326","msg":"Error streaming log","knative.dev/traceid":"53b3d4b0-823c-4aec-a6e7-5d51f4ff8d77","knative.dev/key":"build-templates-e2e/verify-enterprise-contract-run-bgn9f","results.tekton.dev/kind":"PipelineRun","namespace":"build-templates-e2e","kind":"PipelineRun","name":"verify-enterprise-contract-run-bgn9f","error":"error reading from tkn reader: pipelineruns.tekton.dev \"verify-enterprise-contract-run-bgn9f\" not found","stacktrace":"github.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).sendLog.func1\n\t/opt/app-root/src/pkg/watcher/reconciler/dynamic/dynamic.go:326"}
And then nothing.
I had a look at the relevant code and it appears that in this case a GRPC stream is opened to stream the logs, however in the error return case it is never closed. My hypothesis is that after this happens a number of times we exhaust the number of available GRPC connections/streams and the watcher hangs.