sky-uk / osprey

Kubernetes OIDC CLI login
BSD 3-Clause "New" or "Revised" License
49 stars 17 forks source link

Feature: user login concurrently #106

Closed zemanlx closed 1 week ago

zemanlx commented 1 week ago

Speedy Concurrent Cluster Login with Osprey - A Game Changer!

🎉 Introducing the Revolutionary Concurrent Cluster Login Feature with Osprey! 🎉

Do you find yourself waiting forever as Osprey logs you into each cluster one by one? Are you tired of wasting precious time watching cluster names appear on the screen, one after the other? Well, we have fantastic news for you! Our latest feature brings the power and speed of concurrent logins to Osprey, transforming your experience from slow and tedious to lightning-fast and efficient!

🔧 What's New?

💡 Why You Need This Feature:

📈 Results That Speak for Themselves: Imagine logging in to all your clusters in a fraction of the time it used to take. Our tests show an incredible improvement in speed and performance, making your cluster management tasks a breeze.

# before
osprey user login --group non-prod  0.95s user 0.17s system 14% cpu 7.628 total
# after
./cosprey user login --group non-prod  0.90s user 0.14s system 30% cpu 2.377 total

🔜 Coming Soon to a Repository Near You! Don't miss out on this incredible upgrade. Update your repository now and experience the future of cluster logins with Osprey!

🎊 Upgrade Today and Watch Your Productivity Soar! 🎊

Order your update now by merging this PR and say goodbye to slow, sequential logins forever!


PR Includes:


Experience the magic of concurrent logins with Osprey. Merge this PR and turbocharge your workflow today!


Note: Results may vary based on the number of clusters and network conditions. For best results, ensure your environment is properly configured for concurrent operations.

aecay commented 1 week ago

awesome work. there may be further speed gains to be had as well.

currently we have an api which is RetrieveClusterDetailsAndAuthTokens. that does two things:

we should be doing the second step in the background while we are waiting for the browser. if we do that, then the time taken for osprey user login should be dominated by the time the oidc flow takes.


on the code in this PR, though, you're calling the azure implementation of RetrieveClusterDetailsAndAuthTokens in parallel in goroutines. i'm not convinced that that's goroutine-safe. won't that try to call r.oidc.AuthWithOIDCCallback multiple times in parallel?

(we're also not doing context propagation, so an error in one cluster fecth won't abort the other clusters. that may or may not be what we want; certainly in today's sequential world we try all clusters rather than stopping on the first error. what i'm slightly more worried about is signal handling, and whether ctrl-c will successfully terminate the program [all goroutines]. but i'm rusty on how signal handling works in go....)

zemanlx commented 1 week ago

Thank you @aecay!

I spent some time with the method RetrieveClusterDetailsAndAuthTokens and it seems to be complicated. I would rather have two methods for being responsible for one job only in Azure, but it is needed for the Osprey server where it is all done in one step. If we have separate methods we can do OIDC flow concurrently while we fetch all information from clusters. So unless we want to change interface/API there is no easy way to do it I can think of.

Right now we are waiting for the first cluster to finish using mutex and then reuse that token for all other clusters (similar to what was happening when we were calling it sequentially). But because of that mutex, it is safe as no two goroutines can get token/start OIDC flow concurrently. However, they can concurrently ask for server info.

I am doing context propagation as far as it is supported. However, I see your point that the way it is done does not make much sense. If we want to make it meaningful we would need to change the interface of retriever to support passing context. If you are ok with that, I can do it, but it will be much more lines to review 🙈

aecay commented 1 week ago

it is needed for the Osprey server where it is all done in one step

i'm pretty sure nothing uses osprey server. as in, nothing at all. so we could get rid of it (in a major version upgrade of osprey) if that would make our life easier.

If you are ok with [a different approach to context propagation], I can do it, but it will be much more lines to review

nah, don't bother. i wanted to talk about the complexities and limitations of the code (e.g. if you were going to be doing further work on the codebase, to make sure you took them into account), but i don't want to cause extra work or bloat the scope of this PR.

Right now we are waiting for the first cluster to finish using mutex

thanks -- i missed the subtlety of how that mutex fitted into the flow. now i see how the code is goroutine-safe.