peburrows / goth

Elixir package for Oauth authentication via Google Cloud APIs
http://hexdocs.pm/goth
MIT License
289 stars 111 forks source link

Apply continue callback to reduce startup time of Goth.Server process #119

Closed nallwhy closed 2 years ago

nallwhy commented 2 years ago

Goth.Server took over 500ms for the startup. It increases startup time of applications using goth. To reduce the startup time, I applied continue callback to Goth.Server.

wojtekmach commented 2 years ago

Thank you for the PR!

The long-ish startup time is not a bug, it's a feature! The idea is we attempt to prefetch the token on application boot, which would explain the slowness you are experiencing, so that once we start accepting web traffic, the cache is already warm.

I think this is a pretty reasonable default behaviour however if that's undesirable, my suggestion would be to allow setting prefetch: false option on the server. If that sounds good, could you send a PR? I'm gonna close this one in the meantime, but feel free to continue the discussion of course.

wojtekmach commented 2 years ago

Oh, prefetch: false isn't exactly equivalent to what you have in the PR since you're still prefetching, just asynchronously. We could support it via :async value. So I think we'd have these possible values:

nallwhy commented 2 years ago

With this PR, it just doesn't block other processes and prefetches a token right after the server is created. There is no change in the behavior of goth. I think this should be set as the default.

wojtekmach commented 2 years ago

If we prefetch the token asynchronously it is possible we start accepting the traffic while the cache is cold. Imagine you deploy a new release, your new node joins the cluster, and is immediately hit with 1000s of web requests which in turns means we call Goth.fetch 1000s of times. Your solution handles this particular scenario better than what is on master because (to simplify, assuming there are no http failures) we'd only ever make one http request, the one from handle_continue as that is guaranteed do be the first message the server processes. All the other fetches will go through handle_call and would have had a cache hit by then. That being said, going through a GenServer.call for every token fetch is not scalable as a single GenServer can easily become a bottleneck. Removing that bottleneck was one of the major improvements we wanted to do for Goth v1.3. (Btw, if you're only ever accessing the state using GenServer.call, there's little reason to use ETS for storing the state, use your servers state instead.)

I think that an async prefetch is a good idea but I also think there's place for sync prefetch so this would need to be configurable. More importantly though, you need to make changes to Goth.Server.fetch to first read from ets and only if there's a cache miss, do a GenServer.call.

nallwhy commented 2 years ago

Oh, I miss understood Registry. I thought Registry can be a bottleneck but it is not.

I think that an async prefetch is a good idea but I also think there's place for sync prefetch so this would need to be configurable. More importantly though, you need to make changes to Goth.Server.fetch to first read from ets and only if there's a cache miss, do a GenServer.call.

That will be great. Thanks.