I created ~320 Triggers on a Broker that is backed by a PubSub Channel. The Channel has created PullSubscriptions for ~5 of the subscribers, and has gotten permanently live locked.
Theory:
The Channel reconciler was reconciling a Channel, in particular it was creating the PullSubscription. It tried to write out the new status, but the Subscription controller raced with it and had already injected a new subscriber into the Channel's spec, so the Channel reconciler's UpdateStatus call was rejected. On every subsequent reconcile loop, the PullSubscription's UID was not in the status, so it was marked for creation. It already existed, so creation failed.
If the possible fix works, then we will need to check every reconciler that creates objects and ensure it has the same 'adopt if owned by me' behavior.
I created ~320 Triggers on a Broker that is backed by a PubSub Channel. The Channel has created PullSubscriptions for ~5 of the subscribers, and has gotten permanently live locked.
Theory: The Channel reconciler was reconciling a Channel, in particular it was creating the PullSubscription. It tried to write out the new status, but the Subscription controller raced with it and had already injected a new subscriber into the Channel's spec, so the Channel reconciler's UpdateStatus call was rejected. On every subsequent reconcile loop, the PullSubscription's UID was not in the status, so it was marked for creation. It already existed, so creation failed.
Possible fix: If creating a PullSubscription fails because it already exists, then check if its owner is the current Channel. If so, add it to the
subUpdates
slice. If not, then return the error. This would modify the following code: https://github.com/google/knative-gcp/blob/68086a12567deadd93d2e58f854a2c8b09bc467d/pkg/reconciler/channel/channel.go#L261-L265