solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.07k stars 437 forks source link

Support clustered redis and client-side sharding between extauth and redis for token storage #7896

Open bdecoste opened 1 year ago

bdecoste commented 1 year ago

Gloo Edge Version

1.13.x (latest stable)

Kubernetes Version

None

Describe the bug

If I have multiple Redis pods and OIDC configured to store the tokens in Redis then the tokens are stored in a single and random Redis instance. A query from extauth to retrieve the tokens using the opaque token in the cookie will go to a random Redis instance. This Redis instance may or may not be the instance where the tokens were stored. A Redis miss results in a query back to the IdP to retrieve the tokens from the active session. This is inefficient and negates much of the value of local Redis caching of the tokens.

Steps to reproduce the bug

  1. Create an OIDC AuthConfig using Redis token storage
  2. Create a Route with OIDC extauth enabled
  3. Scale Redis to multiple pods, the higher the easier to replicate
  4. Go through the OIDC flow
  5. Delete/invalidate the session in the IdP
  6. Reload the browser. If there is a Redis miss then extauth will try to re-fetch the tokens using the active session and fail. Which results in a new login. But the valid tokens are still in Redis, but because extauth is load-balancing between the Redis pods the chances of a miss are significant.

Expected Behavior

OIDC/extauth supports a clustered Redis client and client-side sharding as rate-limiting does to prevent cache misses:

https://github.com/solo-io/solo-projects/blob/master/install/helm/gloo-ee/templates/2-rate-limit-deployment.yaml#L173-L174

https://github.com/solo-io/solo-projects/blob/master/install/helm/gloo-ee/templates/2-rate-limit-deployment.yaml#L219-L237

Additional Context

No response

┆Issue is synchronized with this Asana task by Unito

DuncanDoyle commented 8 months ago

Question on this: what is the requirement to have multiple Redis pods? Is that for HA/failover, read performance or for horizontal storage scaling. In the first 2 cases, Redis clustered mode is not required, and a multi-node setup in a single replication group would be sufficient. That way, there is also no need for data sharding, as a single replication group consists of 1 shard.

anessi commented 8 months ago

Our main use-case is HA/failover, however, scaling may be required as well. The current Gloo Edge version does not support this, but when storing the session in Redis using Gloo Edge OIDC it is a must. Otherwise sessions are lost if a single Redis POD/shard dies. The request for an authenticated end user will fail and re-login to the application will be required.

I'm of the opinion that you should leave the choice which Redis setup works best to the customers/users of Gloo Edge. Maybe it be Redis cluster, with/without sharding or even Redis sentinel. As I understood the client-library you're using supports all of it (https://github.com/redis/go-redis). Solo just needs to provide the required configuration options to configure the library for all the different scenarios.

Please also see the discussion in https://github.com/solo-io/gloo/issues/5647 We are using Redis Sentinel for our other apps (not Gloo Edge), but are fine with Redis cluster as rate-limit only supports Redis cluster anyway. Also this use case is for a cloud provider which only supports Redis cluster and not Redis Sentinel as managed service.