Closed robertknight closed 6 years ago
The way that the realtime messaging is implemented by h.realtime
is not going to work with Amazon SQS.
h.realtime
relies on a feature of AMQP where "direct exchanges" can, despite their name, be used to send one message to multiple consumers by creating multiple queues which are associated with the same routing key. When a producer publishes a message with that routing key, the exchange will deliver it to all queues bound to this key.
In our context, the web process publishes a message to a routing key which is a category of event (eg. realtime-annotation
) and each websocket process creates one queue per category with the same routing keys. When the web process publishes a realtime update message, each websocket process will receive it.
Making this multicasting happen relies on a central entity (the RabbitMQ message broker) knowing about all the queues and their routing keys. When using Kombu's direct messaging with Amazon SQS however, the "exchange" exists purely in memory in each client and the routing key is used as the SQS queue name. If my understanding is correct, this means that if multiple consumers try to pull from the "exchange" with the same routing key, then one of the consumers (eg. one of the websocket processes) will win and the others will never see the message.
Kombu does also support topic and fanout exchanges which are more directly designed for this kind of multicast broadcast. The documentation states that Amazon SQS supports fanout exchanges by having the routing table be stored in SimpleDB but it looks like this was actually removed some time ago.
I've investigated our use of Celery for background tasks. Fortunately the way we use Celery for tasks conforms to the various limitations that come with SQS. Namely:
By adding the necessary runtime dependencies (boto3, pycurl Python deps and the curl-dev Alpine package) I was able to deploy a Docker image configured to use SQS for the broker instead of RabbitMQ and trigger a successful background task execution on the deployed container.
I still need to verify that period tasks function correctly when using SQS.
The Celery docs state that the SQS broker support is stable. The note in the README appears to be out of date. Fixed by https://github.com/celery/celery/pull/4756
Our periodic tasks are executed in production by a separate "h-periodic" service. That will need some additional changes to support SQS.
PR for h - https://github.com/hypothesis/h/pull/5035
I've verified that SQS is working with Celery's background tasks executed via h-periodic as well. So to wrap this up:
BROKER_URL
is configured to be sqs://
then realtime updates are disabled in https://github.com/hypothesis/h/pull/5035I'm going to wrap this up for now. The Elasticsearch migration is taking priority this week but we'll get the SQS PRs merged when people have sufficient bandwidth free.
A partner that we are working with has expressed a preference to use SQS rather than RabbitMQ if possible for the message queue, to avoid adding additional infrastructure (whether they run it themselves or use eg. CloudAMQP) for their ops team to manage.
Celery supports SQS as does the underlying Kombu library which we use directly. The issue here is to research whether this would be possible to support and whether there are any issues with doing so.