Open fabriziopandini opened 1 year ago
I think the problem is bigger than just us adding indexes on client creation. The same can happen for every Get/List call in our reconcilers. All of them can trigger the creation of an informer.
I think we have to take a closer look how all of this works in CR. I think there could be a way to use a different http client for the synchronous calls during informer creation vs ListWatch. Code is roughly here: https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/cache/internal/informers.go#L377-L503
This could e.g. work by passing into the cache:
POC: https://github.com/sbueringer/cluster-api/pull/151/files#
But not sure if that covers everything relevant. It should be definitely cleaner to implement this in CR. I think my current POC is just exploting the current behavior which is not guaranteed to stay the same.
/triage accepted
/priority important-soon
/assign @sbueringer please ping me when you do the investigation
This issue is labeled with priority/important-soon
but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.
You can:
/triage accepted
(org members only)/priority important-longterm
or /priority backlog
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/triage accepted
This issue is labeled with priority/important-soon
but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.
You can:
/triage accepted
(org members only)/priority important-longterm
or /priority backlog
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/triage accepted
What steps did you take and what happened?
When we create watches on the remote cluster using the cluster cache tracker, watches are created using the same rest config that the cache uses, and currently this rest config has a timeout of 10s.
This leads to having watches on remote clusters expire every 10s, and even if this doesn't seem an issue thanks to watch bookmarks, this isn't ideal and we should figure out a way to have a longer timeout for the cache than the one used for the remote clients.
What did you expect to happen?
Cluster API to create watches on remote machines with longer timeouts
Cluster API version
main, old versions
Kubernetes version
No response
Anything else you would like to add?
The tricky part of this is that we want to keep the 10-second/short timeout for the remote client, and we also want to use it while creating a new remote client because this timeout is blocking a reconcile loop in a synchronous way.
However, unfortunately, when creating a new remote client we are also creating indexes on the cache, and thus a longer timeout on the cache will block unless we find a way to initiate indexes asyc
Label(s) to be applied
/kind bug One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.