jhalterman / lyra

High availability RabbitMQ client
Apache License 2.0
263 stars 74 forks source link

Lyra recovers queues that have already been discarded from channel closures #48

Closed spatula75 closed 9 years ago

spatula75 commented 9 years ago

This might be a duplicate of #20 but I'm not certain.

To reproduce:

Make a bunch of channels, then for each channel, queueName = channel.queueDeclare(someQueue, false, true, true, null), then channel.queueBind(queueName, someExchange, someTopic).

Then channel.close() each channel, and then force a recovery to happen.

Observed behavior: Lyra attempts to recover every queue and binding, even though amqp-client long ago discarded those queues because the connections closed (they're marked exclusive and auto-delete, and it can be observed using rabbitmqctl list_queues that they came and went).

Expected behavior: Lyra forgets about queues that are gone because their channels got closed and they were marked for autodeletion.

jhalterman commented 9 years ago

The way this works currently is that queue recovery will occur for any queues that are created with autoDelete=true ordurable=false until they are explicitly deleted via a queueDelete call. One of the reasons for this is that queues are not really dependent on any given channel - so merely closing the channel that you created the queue through will not necessarily axe the queue. Auto-deleting queues will go away once all of their consumers have been closed (including at least one consumer), and those consumers could be attached to multiple channels, some of which may not even be through Lyra. So to be safe, we require explicit queueDelete calls to have queue recovery not take place.

Queue recovery is enabled when channel recovery is enabled, but can be disabled via Config.withQueueRecovery(false);.

Let me know if you think this doesn't make sense and if some other approach might be better. Lyra could, for instance, track which queues consumers are created for, and not recover those queues once the consumers, or the channels they were created through, are closed.

spatula75 commented 9 years ago

Oh, yeah, that's a good point... I hadn't considered that something else might be looking at the queue into which Lyra has no visibility. It also wouldn't make terrific sense to have Lyra continually checking to see if a queue still exists, and by the time the connection fails, it would be too late to check to see if it's there anymore such that you should try to recover it.

Maybe it's best resolved as a bit of documentation / caveat to the developer: if you KNOW you are no longer going to use your queue (eg, like a temporary reply queue), then make sure you actually delete it instead of relying on it going away because the consumer finished.

What about also checking after a consumer is canceled to see if the queue from which it was consuming also went away, and then dropping the queue from those that might be recovered if it did... of course the problem there is that the queue might not disappear instantly and you'd have some potential for a race with the server.

jhalterman commented 9 years ago

That last bit is not a bad idea, but I agree the actual queue deletion probably lags the channel/consumer closure. Plus I don't really want to have to hit the user's server in secret in order to determine something like that. So far Lyra is pretty hands off, which is good.

I would probably either assume that all channels/consumers are being managed by Lyra, or require explicit deletion. What do you think?

spatula75 commented 9 years ago

I'm in favor of explicitly deleting the queues, which is the approach I ended up taking for our project. I was just rather surprised the first time we had a recovery, and then had about 10,000 queues recovered in the logs, eventually leading to RejectedExecutionExceptions being thrown from somewhere and a general mess being created. What I did was just add some finally{}-style logic to make sure that when we use a temporary queue, we always explicitly delete it rather than relying on it going away by magic because the consumer canceled and nothing else cared about the queue (the lazy approach).

I think declarative, explicit deletion expresses a developer's intention in the most-clear way, so that would be the direction I'd lean. I think our code reads better with the explicit deletion-- there's no 'magic' happening, and it's easy to see what's going to happen, and when.

My suggestion would be adding a line to the docs somewhere as a caveat about automatic queue recovery, to the effect that if you know you're not going to come back to a queue, and you know nothing else should be listening to that queue, delete it explicitly if you're planning to use automatic queue recovery.