Open miketheman opened 1 year ago
I think Redis had limitations that RabbitMQ and SQS didn't have, but that might have been fixed in the intervening years (or we may not care about them anymore). I don't recall exactly, but I think it was something like, Redis relied on Celery requeing failed tasks, but SQS and RabbitMQ the task would automatically be requeued if the worker didn't report successful within some amount of time.
Redis relied on Celery requeing failed tasks, but SQS and RabbitMQ the task would automatically be requeued if the worker didn't report successful within some amount of time.
That might have been the concept of visibility_timeout
that is only supported by SQS and Redis queues.
https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html#visibility-timeout https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html#id1
RabbitMQ had more "smarts" and the broker would decide how to handle when workers left and never came back.
The default is now 1 hour, and we don't override it for SQS as far as I can tell either - so we should be on par for that behavior.
That might have been it, seems likely!
To be clear I don't have any real opinion on what broker implementation we use, I was just trying to remember the context the original decision was made in, to see if it still applied.
An additional datapoint - we include some packages that are specific to the SQS implementation, such as pycurl
as part of kombu
's `extras:
https://github.com/celery/kombu/blob/1dfe4f3c86ab0fd2587a6fe8566bb3cef8c4a5d7/requirements/extras/sqs.txt#L2
We'd be able to drop that if we converted to using Redis.
Closing. We have no plans to replace SQS for now, but the investigation proved useful to show that it's possible, if we wanted to pick it up again. Before we do so, we should probably split apart the redis clients to use distinct databases for isolation.
I went back in Slack history of #pypi-admin
and didn't see any particular motivator, mostly just that SQS was available 🤷🏼.
I think this is worth pursuing and will put some effort in.
We're using Amazon SQS as our message broker today.
Celery support for SQS does not afford Monitoring nor Remote Control from within the context of Celery.
We can monitor items like queue depth for SQS via Cloudwatch Metrics (and possibly in Datadog) but we have little visibility as to what is in the queue itself.
@ewdurbin and I recently chatted about the notion of queue debouncing/superseding, and I raised the question of whether we might want to use Redis as the broker instead. We already use Redis as part of the stack, and we could leverage the same cluster/instance for the Celery broker. This would do a couple of things:
kombu
~We also discussed using SQLAlchemy and Postgres as a backend, and rejected it. The docs kinda support that decision.
The change "seems" easy, since most of our setup is already governed by a config var of
BROKER_URL
: https://github.com/pypi/warehouse/blob/67d5b04228bd401eb75d29c3efec92383e51c9ad/warehouse/config.py#L180 https://github.com/pypi/warehouse/blob/38e0e0400f8585e382aa5d48836ef08fcfde742a/warehouse/tasks.py#L184We already use Redis in aspects of our celery lifecycle: https://github.com/pypi/warehouse/blob/67d5b04228bd401eb75d29c3efec92383e51c9ad/warehouse/config.py#L181-L182
We could set
celery.broker_url
toREDIS_URL
- but I'm noticing that this persists a pattern of dumping all of the keys into redis's default databasedb:0
- which would make selecting specifics keys harder, since everything would be intermingled.KEYS *
can be an expensive operation (kinda likeSELECT * FROM <<EVERYTHING>>
)We also share the same
db:0
withoidc.jwk_cache_url
,sessions.url
,ratelimit.url
, andwarehouse.xmlrpc.cache.url
- so that's not 100% awesome either.Here's some semi-structured thoughts, totally open to more.
REDIS_URL
and create db-specific config variables for differently-scoped keys, likeratelimit
,sessions
, et al. Not totally sure how to manage the data migration yet.BROKER_URL
pointing to a redis database to eat any new messagesweb/web-uploads
config value ofBROKER_URL
pointing to redisObviously, I'm probably overlooking something, so would definitely want to hear other reasonings, opinions, thoughts, mistakes, etc.