telepresenceio / telepresence

Local development against a remote Kubernetes or OpenShift cluster
https://www.telepresence.io
Other
6.61k stars 521 forks source link

`telepresence list` takes a very long time to complete (~4 mins) #3714

Closed jacksehr closed 3 weeks ago

jacksehr commented 3 weeks ago

Describe the bug When running telepresence list, it often takes almost 4 mins to complete.

I have tried first connecting with and without --mapped-namespaces, and it has not made a difference.

Debugging steps taken

When debugging a local version of the user daemon + CLI, the list command always gets to this branch: https://github.com/telepresenceio/telepresence/blob/ebb90267bddd121e19eaec02dca75669ede077f6/pkg/agentmap/discorvery.go#L202

On further inspection, the namespace parameter of that function is always an empty string, and that is fed through in this line: https://github.com/telepresenceio/telepresence/blob/ebb90267bddd121e19eaec02dca75669ede077f6/pkg/agentmap/discorvery.go#L156-L157

So, what I believe is happening is that:

To this end, I tried setting this Namespace field manually on my local build, at which point the command went down from ~4mins to ~40s. This is a marked improvement, but still quite a long time -- and we still end up in the same branch of code that we shouldn't be in. I'm not entirely sure when the informer is supposed to have a K8sFactory set on it per namespace, but I'm guessing that's not happening right now.

To Reproduce

  1. Set up a cluster with several namespaces, each with several pods (e.g. we have 70+ ns, each with 100+ services)
  2. Run telepresence list
  3. Observe, either via debugging/logs, that all namespaces are searched through

The logs I have currently speak to a lot of workplace specific info -- I can try and share them if absolutely necessary.

Expected behavior telepresence list should ideally complete relatively quickly, at most in ~10s.

Versions (please complete the following information):

$ telepresence version
OSS Client     : v2.20.1
OSS Root Daemon: v2.20.1
OSS User Daemon: v2.20.1
Traffic Manager: not connected