Suggestions on how we can improve django-rq and add support for Django 4

riclm commented 2 years ago

Django 4 has added support for caching with Redis:

I know not everyone can take advantage of this right away, but this might be helpful for people like me who are starting a new project today.

So, after reading the source code django/core/cache/backends/redis.py, I can say that the integration django<->redis-py is quite elegant. I'm not 100% sure, and that's why I'm hoping to get some feedback from you guys, but this new cache backend kind of solves some issues.

For instance, issue #210, _get_redisconnection() creates new redis client every time it is called, and according to redis-py ~ Connection Pools:

By default, each Redis instance you create will in turn create its own connection pool.

I know we can just:

redis_cursor = redis.StrictRedis(host='', port='', db='', password='')
high_queue = django_rq.get_queue('high', connection=redis_cursor)

In order to avoid creating new connections and controlling client instances.

Small tangent:

Client Classes: Redis and StrictRedis redis-py 3.0 drops support for the legacy Redis client class. StrictRedis has been renamed to Redis and an alias named StrictRedis is provided so that users previously using StrictRedis can continue to run unchanged.

However, we can make this easier by using Django's new backend:

from django.core.cache import caches

my_cache = caches["name_of_my_redis_backend"]
redis_client = my_cache._cache
redis_client.get_client()
# ...

Note that the _cache method has the cached_property decorator, which, as far as I can tell, will return the same instance every time it's called. This seems nicer than dealing with a singleton and instantiating a redis client directly like r = redis.Redis(...). Django already takes care of almost everything via the Cache API.

class RedisCache(BaseCache):
    def __init__(self, server, params):
        # ...
        self._class = RedisCacheClient
        # ...

    @cached_property
    def _cache(self):
        return self._class(self._servers, **self._options)

Here's how my settings file looks:

# settings.py

CACHES = { 
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": REDIS_LOCATION,
        "OPTIONS": {
            "db": "0",
            "parser_class": "redis.connection.HiredisParser",
            "pool_class": "redis.BlockingConnectionPool",
        },
        "TIMEOUT": 300,
    }   
}

My point is: getting a client connection from Django's cache API is probably simpler. Most of the details (connection pools, redis client instances, etc.) have already been taken care of. Even when considering backwards compatibility, we can still change some stuff to mimic some parts of django's redis backend.

Also, the get_redis_connection() function from the queues module does not work with Django's new redis backend IF we try to use the USE_REDIS_CACHE config.

    if 'USE_REDIS_CACHE' in config.keys():
        # ...

        from django.core.cache import caches
        cache = caches[config['USE_REDIS_CACHE']]
        # We're using django-redis-cache
        try:
            return cache._client
        except AttributeError:
            # For django-redis-cache > 0.13.1
            return cache.get_master_client()

Fixing this to add support for Django 4 would be pretty simple since we can just do caches["name_of_my_redis_backend"]._cache.get_client(...). Everything else could remain the same to maintain compatibility.

Another thing that caught my attention was this: persistent database connections? #228.

It seems Django-rq uses the reset_db_connections() function to close all database connections via django.db.connections -- rqworker.py.

Why is this necessary? I mean, let's say I'm running some Docker containers, one for each -- Redis, Postgres, and Django --, why must I close all database connections? Why do postgres connections impact redis connections?

I don't fully understand everything that Django-rq (or even RQ) does, so I'm guessing that there's a good reason for that. Could you guys clue me in?

@selwin mentioned:

Not at the moment, Django-RQ explicitly closes all DB connections prior to calling os.fork() and executing jobs. Suggestions on how we can make this more efficient is welcome.

But if you guys could elaborate a bit.. it would be nice.

Django's redis backend also handles connection pools via the _get_connection_pool() method when we get_client(). So, this seems like it could improve the overall architecture of Django-rq?

Thank you!

DragosDumitrache commented 1 year ago

I'm just butting in, as I've been using this with Django 4, but what kind of error would we be seeing? Asking, as I keep encountering an improperly configured database error, saying I need to pass in engine, despite my database being configured properly

selwin commented 1 year ago

My point is: getting a client connection from Django's cache API is probably simpler. Most of the details (connection pools, redis client instances, etc.) have already been taken care of. Even when considering backwards compatibility, we can still change some stuff to mimic some parts of django's redis backend.

Yes, I'm all for this. If someone can open a PR to support using the same Redis connection from Django's cache backend, I'm happy to review and merge it in.

rq / django-rq

Suggestions on how we can improve django-rq and add support for Django 4 #528