mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2 stars 5 forks source link

RabbitMQ configuration #88

Open philbudne opened 1 year ago

philbudne commented 1 year ago

An umbrella issue to empty my mind of RabbitMQ configuration issues:

  1. Run under Docker or bare metal?
  2. RabbitMQ can block producers when disk space is low: https://www.rabbitmq.com/disk-alarms.html -- the code I found runs the "df" command and parses output, so it may NOT work if RabbitMQ is under docker and a disk usage quota is applied. We might want/need to create a separate Linux filesystem for message storage
  3. Run in cluster mode (need to configure all queues as "quorum queues")? Multiple servers under Docker will be load balanced by "swarm routing mesh"? For multiple servers under bare metal, DNSRR (DNS name with multiple A(ddr) records) MAY work, SO LONG as the server and client IP addresses have different IP address prefixes due to RFC3484 implementation by getaddrinfo
rahulbot commented 1 year ago

@philbudne Does the fix for #102 and other recent changes resolve these issues or is it still a question to keep open?

philbudne commented 1 year ago

PR https://github.com/mediacloud/story-indexer/pull/102 contains the fix to Issue https://github.com/mediacloud/story-indexer/issues/97 (which it looks like I accidentally reported data for in this Issue (and I've just deleted))

This can be tagged or milestoned as a long-term "survivability" issue; not critical right now (we're running one day at a time, and the queues are always clearing by end of day). If/as we move towards more state in the queues, their importance of queue robustitude is embigened.

philbudne commented 3 weeks ago

NOTE: Quorum queues include better handling for messages that cause a worker to hang (go unacked): https://www.rabbitmq.com/docs/quorum-queues#poison-message-handling

philbudne commented 3 weeks ago

I originally assumed we would run a RabbitMQ cluster with all three ES/storage servers.

In a larger cluster with a limited number of master(*)-eligible (or master-only) nodes, maybe run RabbitMQ on just those??

(*) Is it too P.C. of me to wish ES would switch to some other term?