Open helgi-reon opened 2 years ago
hello @helgi-reon,
we currently do not support redis cluster, if you want to contribute and create a PR to support it I would be more than happy to review it.
Otherwise I am afraid that this is going to the queue of new features
Ok, good to know. I'll definitely contribute when and if I find a solution 😀
You will need to create a different client probably, and figure out what is the difference when using RedisCluster
client. I would have a look at tests in redis-py
.
maybe I am into something... but could you try again? maybe this helps? redis/redis-py#2189
I managed to get the caching working in Django while using a Redis Cluster. It was achieved with a little trial and error approach so it most likely not an optimal solution.
from redis import RedisCluster
class CustomRedisClient(RedisCluster):
def __init__(self, url, **kwargs):
client = RedisCluster.from_url(url)
kwargs["startup_nodes"] = client.get_nodes()
del kwargs["connection_pool"]
super().__init__(**kwargs)
# settings.py
# REDIS_URL just needs to point to one of the nodes in the cluster e.g. redis://localhost:6379/0
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": REDIS_URL,
"OPTIONS": {
"REDIS_CLIENT_CLASS": "common.redis_client.CustomRedisClient",
"REDIS_CLIENT_KWARGS": {
"url": REDIS_URL,
},
"PARSER_CLASS": "redis.cluster.ClusterParser",
}
}
}
The most interesting part is the deletion of the connection pool attribute. For some reason the connection pool made the correct number of connection nodes but all with the same port. That meant that most of the time it couldn't find a caching value as it was stored on a different node. By removing the connection pool argument the connection pool attribute is populated correctly i.e. with connection nodes for each of the available ports in the cluster.
nice! would you like to open a PR with cluster support?
Hi there! I've been working to add cluster support to our project based on @helgi-reon approach. I figured RedisCluster should be initialized once per process to be able to have connection pooling, given that on redis-py NodesManager
will initialize one connection pool per node https://github.com/redis/redis-py/blob/6c708c2e0511364c2c3f21fa1259de05e590632d/redis/cluster.py#L1426.
This is the code I'm using for now (and on settings pointing REDIS_CLIENT_CLASS
to redis_cluster_factory
) :
import threading
from redis import RedisCluster
_lock = threading.Lock()
# _cluster is a process-global, as otherwise _cluster is cleared every time
# redis_cluster_factory is called, as Django creates new cache client
# instance for every request.
_cluster = None
def redis_cluster_factory(url, **kwargs):
# redis-django will pass us a connection pool on kwargs which we don't use
# given that connection pools are handled per node on RedisCluster.
global _cluster
if _cluster is None:
with _lock:
if _cluster is None:
_cluster = RedisCluster(url=url)
return _cluster
@WisdomPill what do you thing would be the best route to contribute the redis cluster support to the project? As far as I understood, I should:
redis_cluster_factory
function by something like class CluserConnectionFactory
CONNECTION_POOL_CLASS
as connection_pool_class
kwarg and spread CONNECTION_POOL_KWARGS
as well.I don't think we can support different REDIS_CLIENT_CLASS
per node, as Redis
is hardcoded initialized here https://github.com/redis/redis-py/blob/6c708c2e0511364c2c3f21fa1259de05e590632d/redis/cluster.py#L1426.
Another thing is that I would like to allow users to have cluster and non-cluster caches at the same time, for this I should move DJANGO_REDIS_CONNECTION_FACTORY
to be part of "OPTIONS"
instead of being taken from global settings (giving OPTIONS.DJANGO_REDIS_CONNECTION_FACTORY
precedence over the settings one would be enough).
We tried out the implementations above and ran into issues when one of our Redis nodes were replaced. We're operating a redis cluster in kubernetes, so we're expecting this to happen regularly as pods are moved between nodes.
Have you run into this, and do you have any thoughts on how to add failover support to the backend?
@nicootto I would add a subclass or DefaultClient
and maybe create decorator or flag to query primary, replicas or specific node. ClusterClient
would use a different connection factory
Hey friends, I've been picking around the corners of this as well while trying to migrate off of redis-py-cluster to instead use the redis.cluster.RedisCluster
stuff that landed in redis-py...
I've been stealing/consolidating/learning from some of the ideas here, my current thoughts are over in this gist: https://gist.github.com/jonprindiville/f97084ca8f91501c17175a7b7a9578af
There's a connection factory there, because the way that base django_redis.pool.ConnectionFactory
behaves doesn't jive with redis.cluster.RedisCluster
-- they kind of both want to manage their own connection pools. Since we're doing this specifically for RedisCluster
we can set that as default redis-client-class and save some configs.
The base django_redis.client.default.DefaultClient
mostly behaves fine for this application, but I have a client that overrides its choice of connection factory with our own so we don't have to fiddle with the global DJANGO_REDIS_CONNECTION_FACTORY
setting. I think this is what @WisdomPill was getting at in that comment?
(Aside: I do think that pluggable connection factory would be neat, as suggested by @nicootto, but maybe that's a separate feature, idk)
Given this kind of a setup, my Django settings would be like the following:
CACHES = {
'default': {
'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': '...',
'OPTIONS': {
'CLIENT_CLASS': 'cluster_client.DjangoRedisClusterClient',
},
},
}
If it helps, as a starting point, I could get a PR going for that stuff. There's probably gonna be large holes that need filling, though, before that amounts to good cluster support.
Like, I'm not real sure how you'd want to go about adding cluster stuff to your CI processes, what kind of coverage to add, not sure about the target_nodes
stuff that I think @WisdomPill was indicating, I feel like I'm pretty minimally exposed to advanced clustery use-cases, etc
ran into issues when one of our Redis nodes were replaced. We're operating a redis cluster in kubernetes, so we're expecting this to happen regularly as pods are moved between nodes. [...] do you have any thoughts on how to add failover support to the backend?
@sondrelg Hm, I would wonder if that might be more of a redis-py concern.
I know that the clients there seem to have some amount of error recovery, e.g. retrying based on MOVED
responses, rediscovering nodes, etc.
Perhaps tweaking some of the parameters at that level helps you recover from some types of situations? Like cluster_error_retry_attempts
, retry
, retry_on_timeout
, retry_on_error
, reinitialize_steps
perhaps? I'm a bit unsure.
Without understanding all of what's going on inside of redis-py I wouldn't know what to suggest WRT adding failover on top at this layer.
I don't think we can support different REDIS_CLIENT_CLASS per node, as Redis is hardcoded initialized here https://github.com/redis/redis-py/blob/6c708c2e0511364c2c3f21fa1259de05e590632d/redis/cluster.py#L1426.
@nicootto True, yes. But! I think that from the perspective of django-redis the REDIS_CLIENT_CLASS
is redis.cluster.RedisCluster
.
The fact that inside of redis.cluster.RedisCluster
there is a dependency on using the redis.client.Redis
is a bit of an implementation detail of RedisCluster
, IMO. If we wanted flexibility there, that's probably a feature-request to the redis-py folks.
edit: it is quite possible this could be an issue on GCP's side as the internal server error (code -1)
is occurring in read_response
immediately after an AUTH
command here (GCP docs on this)
we're running this in production over in https://github.com/grafana/oncall against a GCP Memorystore managed Redis Cluster.
We've taken inspiration from this GitHub issue, as well as some conversations over in redis-py
. Our setup mostly works, however lately we've been seeing some rather cryptic/unexplainable exceptions popping up that I thought would be worthwhile posting here in the event that anyone else is seeing the same thing. The unexplainable exception we see from time to time is:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/django_redis/client/default.py", line 258, in get
value = client.get(key)
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/commands/core.py", line 1829, in get
return self.execute_command("GET", name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 1115, in execute_command
raise e
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 1101, in execute_command
res[node.name] = self._execute_command(node, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 1144, in _execute_command
connection = get_connection(redis_node, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 51, in get_connection
return redis_node.connection or redis_node.connection_pool.get_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 1086, in get_connection
connection.connect()
File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 279, in connect
self.redis_connect_func(self)
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 672, in on_connect
connection.on_connect()
File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 342, in on_connect
auth_response = self.read_response()
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 524, in read_response
raise response
redis.exceptions.ResponseError: Internal server error (code -1)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/django_redis/cache.py", line 29, in _decorator
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/django_redis/cache.py", line 99, in _get
return self.client.get(key, default=default, version=version, client=client)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/django_redis/client/default.py", line 260, in get
raise ConnectionInterrupted(connection=client) from e
django_redis.exceptions.ConnectionInterrupted: Redis ResponseError: Internal server error (code -1)
Our (simplified/abridged) setup is as follows:
django_redis_client.py
import threading
import typing
from copy import deepcopy
from django.core.exceptions import ImproperlyConfigured
from django_redis.client.default import DefaultClient
from django_redis.pool import ConnectionFactory
from redis.cluster import RedisCluster
class ClusterConnectionFactory(ConnectionFactory):
"""
A connection factory compatible with `redis.cluster.RedisCluster`
The cluster client manages connection pools internally, so we don't want to do it at this level like the base
`ConnectionFactory` does.
"""
# A global cache of URL->client so that within a process, we will reuse a
# single client, and therefore a single set of connection pools.
_clients: typing.Dict[str, RedisCluster] = {}
_clients_lock = threading.Lock()
def connect(self, url: str) -> RedisCluster:
"""Given a connection url, return a client instance.
Prefer to return from our cache but if we don't yet have one build it
to populate the cache.
"""
if url not in self._clients:
with self._clients_lock:
if url not in self._clients:
self._clients[url] = self._connect(url)
return self._clients[url]
def _connect(self, url: str) -> RedisCluster:
"""
Given a connection url, return a new client instance.
Basic `django-redis` `ConnectionFactory` manages a cache of connection
pools and builds a fresh client each time. because the cluster client
manages its own connection pools, we will instead merge the
"connection" and "client" kwargs and throw them all at the client to
sort out.
If we find conflicting client and connection kwargs, we'll raise an
error.
"""
# Get connection and client kwargs...
connection_params = self.make_connection_params(url)
client_cls_kwargs = deepcopy(self.redis_client_cls_kwargs)
# ... and smash 'em together (crashing if there's conflicts)...
for key, value in connection_params.items():
if key in client_cls_kwargs:
raise ImproperlyConfigured(f"Found '{key}' in both the connection and the client kwargs")
client_cls_kwargs[key] = value
# ... and then build and return the client
return self.redis_client_cls(**client_cls_kwargs)
def disconnect(self, connection: RedisCluster):
connection.disconnect_connection_pools()
class RedisClient(DefaultClient):
"""
A `django-redis` client compatible with `redis.cluster.RedisCluster`.
We don't do much different here, except for using our own `ClusterConnectionFactory`
(the base class would instead use the value of the
`DJANGO_REDIS_CONNECTION_FACTORY` setting, but we don't care about that setting here)
For non-clustered Redis, `django-redis`'s `ConnectionFactory` does some management of
connection pools shared between client instances.
The cluster client in `redis-py` doesn't accept a connection pool from outside, they're
managed internally. To support that, we won't be caching connection pools and passing
them into clients, we will instead be caching client instances.
https://github.com/jazzband/django-redis/issues/606#issuecomment-1505615249
https://gist.github.com/jonprindiville/f97084ca8f91501c17175a7b7a9578af
"""
def __init__(self, server, params, backend) -> None:
super().__init__(server, params, backend)
self.connection_factory = ClusterConnectionFactory(options=self._options)
redis_cluster_client.py
import base64
import json
import logging
import typing
import cachetools.func
import redis.cluster
import redis.credentials
from django.conf import settings
from google.cloud import iam_credentials_v1
logger = logging.getLogger(__name__)
class GCPMemoryStoreCredentialsProvider(redis.credentials.CredentialProvider):
"""
Credentials Provider to fetch an IAM Auth token to be able to authenticate w/ GCP Memorystore.
Inspired by the following:
https://redis-py.readthedocs.io/en/stable/examples/connection_examples.html#Connecting-to-a-redis-instance-with-AWS-Secrets-Manager-credential-provider
https://cloud.google.com/memorystore/docs/cluster/client-library-connection#iam_authentication_client_library_code_sample
"""
def __init__(self) -> None:
service_account_json = json.loads(base64.b64decode(settings.GOOGLE_APPLICATION_CREDENTIALS_JSON_BASE64))
self.iam_client = iam_credentials_v1.IAMCredentialsClient.from_service_account_info(service_account_json)
self.service_account_email = self.iam_client._transport._credentials.service_account_email
def get_credentials(self) -> typing.Tuple[str]:
"""
NOTE: Access tokens expire in one hour (by default), hence the 45min cache TTL
https://cloud.google.com/memorystore/docs/cluster/about-iam-auth#iam_access_token_time_frame
https://cloud.google.com/memorystore/docs/cluster/manage-iam-auth#connect_to_an_instance_that_uses_iam_authentication
"""
@cachetools.func.ttl_cache(maxsize=128, ttl=45 * 60) # 45mins
def _get_iam_access_token() -> str:
logger.info(
"GCPMemoryStoreCredentialsProvider.get_credentials - "
f"Generating IAM access token for {self.service_account_email}"
)
request = iam_credentials_v1.GenerateAccessTokenRequest(
name=f"projects/-/serviceAccounts/{self.service_account_email}",
# https://developers.google.com/identity/protocols/oauth2/scopes#redis
scope=["https://www.googleapis.com/auth/cloud-platform"],
)
return self.iam_client.generate_access_token(request=request).access_token
return (_get_iam_access_token(),)
# NOTE: this is a global singleton
credential_provider = GCPMemoryStoreCredentialsProvider()
class RedisClusterShim(redis.cluster.RedisCluster):
def __init__(self, *args, **kwargs):
kwargs["credential_provider"] = credential_provider
super().__init__(*args, **kwargs)
settings.py
REDIS_URI = "<uri_to_our_memorystore_discovery_ip>"
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": REDIS_URI,
"OPTIONS": {
"PARSER_CLASS": "redis.connection._HiredisParser",
"CONNECTION_POOL_CLASS": "redis.BlockingConnectionPool",
"CONNECTION_POOL_CLASS_KWARGS": {
"max_connections": 50,
"timeout": 20,
},
"MAX_CONNECTIONS": 1000,
"PICKLE_VERSION": -1,
"SOCKET_CONNECT_TIMEOUT": 3,
"SOCKET_TIMEOUT": 3,
# Custom redit client to handle redis downtimes and reconnections
"CLIENT_CLASS": "django_redis_client.RedisClient",
# NOTE: REDIS_CLIENT_CLASS is different from the CLIENT_CLASS config.. from the docs:
#
# django-redis uses the Redis client redis.client.StrictClient by default. It is possible to
# use an alternative client. You can customize the client used by setting REDIS_CLIENT_CLASS in
# the CACHES setting. Optionally, you can provide arguments to this class by setting REDIS_CLIENT_KWARGS
#
# https://github.com/jazzband/django-redis#pluggable-redis-client
"REDIS_CLIENT_CLASS": "redis_cluster_client.RedisClusterShim",
"REDIS_CLIENT_KWARGS": {
"read_from_replicas": True,
# the following setting is required by GCP Memorystore (and AWS ElasticCache AFAIK)
#
# from the docs
# https://cloud.google.com/memorystore/docs/cluster/connect-cluster-instance#redis-py_client_best_practice
#
# To connect to your Memorystore for Redis Cluster instance using the redis-py Python client
# you must add the skip_full_coverage_check=True when declaring a Redis Cluster:
"skip_full_coverage_check": True,
# The path to a file of concatenated CA certificates in PEM format
# NOTE: needed because we have in-transit encryption enabled in GCP Memorystore
# https://cloud.google.com/memorystore/docs/cluster/about-in-transit-encryption
"ssl_ca_certs": os.environ.get("REDIS_SSL_CA_CERTS"),
},
},
},
}
we're running this in production over in https://github.com/grafana/oncall against a GCP Memorystore managed Redis Cluster. [...] however lately we've been seeing some rather cryptic/unexplainable exceptions popping up that I thought would be worthwhile posting here in the event that anyone else is seeing the same thing. [...] it is quite possible this could be an issue on GCP's side as the
internal server error (code -1)
is occurring inread_response
immediately after anAUTH
command
@joeyorlando Hm, I'm afraid that I don't have any relevant experiences like that to share at the moment, sorry.
I have given that code a bit of run in a staging environment, but I have not yet given it extended time in production against real traffic.
Also: In my context I'm not interacting with GCP, I'm in AWS with an ElastiCache cluster (currently compatible with Redis 5.0.6, but I think soon we will move to a version compatible with Redis 7.)
I think I may have to spend some time on this soon, though 😅 I think I'm blocked on something else until I can drop redis-py-cluster and update redis-py + django-redis.
I think it'd be relatively easy to get a PR together for this existing gist/example stuff but the largest questions in my mind in the immediate future for django-redis to support redis-py's cluster client is WRT the django-redis test suite...
We've taken inspiration from this GitHub issue, as well as some conversations over in
redis-py
Oh, can you point at any relevant redis-py conversations? I'm not actively tracking anything on that side at the moment.
- how to incorporate a Redis Cluster into the django-redis CI process?
- do we need additional cluster-specific tests, or we just run the existing non-cluster tests minus the multi-key or otherwise-incompatible commands?
if I can be of any help with a PR here, let me know!
On a side note, turns out my cryptic Internal server error (code -1)
exceptions mentioned above were related to Memorystore's IAM authentication.. GCP's support's recommendation was to "add a retry" for AUTH
commands as the failures I am seeing "fall within their 99.9% SLA" (but this happens fairly deep in the redis-py
internals, so adding a retry here would likely involve forking redis-py
🙄).
Just figured I'd throw this ☝️ here in case anyone else in the community runs into the same issue.
I would love to see a PR about support for redis cluster, I would use the same tactic we're currently using for sentinel (there's a script in tests/start_redis.sh), although I was trying to setup everything like it is done in redis-py with no success in #583.
Problem Statement Use
django-redis
with redis clustering.An error occurs when interacting with the cache. The error points to a pickle operation on an instance of the
ConnectionPool
class where one of it's attributes, a thread lock, cannot be serialized and results in the following error:To Reproduce Steps to reproduce the behavior:
REDIS_URL
points to.CACHES
in Django settings file.cache.get("somekey")
Expected behavior The value of the key from the Redis cluster.
Stack trace
Environment:
Additional Context 🚨 This is my first time using this library and therefore not unlikely that my configurations are wrong.