rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.28k stars 3.91k forks source link

Block publishers when queue length limit is reached #499

Closed vgkiziakis closed 6 years ago

vgkiziakis commented 8 years ago

Currently if a queue has a length limit configured

Messages will be dropped or dead-lettered from the front of the queue to make room for new messages once the limit is reached.

In some cases (where publishers want to know that the messages they send are never discarded) it may be useful to block publishers that send messages to exchanges that have bound queues which have reached their length limit rather than discarding older messages in those queues. The publishers would only be able to send messages again once all queues bound to the target exchange have room.

The above was briefly discussed last Wednesday in our meeting with @tsaleh and @jerenkrantz at the Pivotal offices in London.

michaelklishin commented 8 years ago

There is one important issue with blocking: we can only block the entire connection. RabbitMQ cannot know where the next message will be routed to.

So are you suggesting that we use the same mechanism as with resource-driven alarms (stop reading from connection socket) or use a protocol extension to send publishers notifications (similarly to how we notify them about connections being blocked due to resource alarms)?

How would this be configured for queues? How should this work for connections on remote nodes (any client can publish to any exchange, routing to any queue, from any node)?

vgkiziakis commented 8 years ago

In this case the publisher would effectively be declaring that they are happy for the entire connection to be blocked if any queue that is bound to any exchange they send a message to hits its length limit.

If this is hard/impossible to track perhaps your suggestion for a protocol extension could work. Perhaps this could work similarly to the mandatory or immediate flags. A new type of message flag could indicate that the message should be returned to the sender if it ends up being routed to any queue that has hit its length limit. Is this feasible ?

michaelklishin commented 8 years ago

We already have protocol extensions: connection.[un]blocked. We need a publisher and/or queue to indicate that it should be blocked. Perhaps this can be a queue setting.

Existing methods such as basic.publish cannot be extended, unfortunately. We can introduce new methods as needed. Every client also has an extensible table of "capabilities" which server can check and act accordingly.

vgkiziakis commented 8 years ago

So, if I've understood this correctly, it could work in the following way:

Both the publisher and the queue should indicate they want to be blocked. I think it's somewhat dangerous to only require this to be a queue setting because it would allow consumers to declare tiny blocking queues and effectively block unsuspecting publishers.

Do you think this makes sense ?

michaelklishin commented 8 years ago

@vgkiziakis it definitely does, except that publishers will still use the regular basic.publish method. Smaller details of this need to be worked out but you got the general idea right.

hairyhum commented 8 years ago

Can it be implemented using mandatory flag and proper response code in return message?

michaelklishin commented 8 years ago

mandatory has a different semantics, so does basic.return.

hairyhum commented 8 years ago

Message from publisher can go to multiple queues and some of them can be full and support blocking. In that case publisher will still receive connection.blocked and be blocked. Then publisher can retry sending the message, and it will be routed to "free" queues again while blocked by same or other "busy" queue. To be able to avoid it we should know that all queues are able to receive message, but it will require some kind of multi-queue transactions and can be too hard. So using this feature will require understanding that messages can be enqueued multiple times in some routing scenarios.

hairyhum commented 8 years ago

When queue is blocking publisher it should unblock it at some point. Doing it right after the first message was consumed can cause multiple channels to start republishing again and be blocked. Some kind of additional parameters can be provided to the blocking argument. Like free space in the queue on which it will unblock publishers. It can be relative to queue max_length or absolute number of messages (of bytes). Another concern is requeueing messages. Default overflow strategy is to drop messages and they are dropped on requeue as well. If we introduce non-dropping strategy requeues will overflow queue since they cannot be blocked. Possible solution is to not unblock publishers until there are place for queued messages + unacked messages count.

hairyhum commented 8 years ago

Idea to refactor blocking states in rabbit_reader Currently there are running blocking and blocked states and throttle field, which contain state information. So transitions indirectly depend on this state information.
So proposal is to change this state machine: https://github.com/hairyhum/pics/blob/master/old.png Into this: https://github.com/hairyhum/pics/blob/master/new.png

So there will be more sane transitions which will not depend so much on throttle values during transition.

michaelklishin commented 8 years ago

@hairyhum what's the value proposition of this new state machine? It can be transition and state names that confuse me but I don't find it to be any simpler than what we have.

hairyhum commented 8 years ago

@michaelklishin currently state are divided between connection_status and throttle and desigions are made based on throttle. Also connection could already receive publishes, but stil will be set to blocking on next round (i don't know, maybe it's by design) Desision to send connection.blocked is made by checking alerts in old and new throttle, as well as desision to send connection.unblocked. So state when connection was blocked by flow, or alerts with sending method to client os not explicit. Basically this new machine behave same way as old one, but I tried to take conditional logic and indirect state information from inide transition and throttle to state machine.

michaelklishin commented 8 years ago

@hairyhum there are two separate state machines: a negotiation/running/closing/closed one and another for throttling. If we were to combine them into one, we'd need to come up with descriptive names for such states. This is not going to be easy, and can spill into the management UI. Are you sure the whole thing is worth doing? What does it really improve for this particular issue?

michaelklishin commented 8 years ago

I think we should stop discussing this refactoring idea in this issue as there seems to be little connection. Please submit a PR for what you have, and include the state diagrams above.

hairyhum commented 8 years ago

@michaelklishin it was related to this issue, because I've tried to send connection.[un]blocking from channel because of message overflow. And I could not, because blocking state machine is too implicit. Right now it's gone too far, I agree. Will move to another PR.

chadrik commented 7 years ago

I'd love to see this feature. @hairyhum did your state machine refactor pave the way for this change?

fake-name commented 7 years ago

I've been hitting some issues that'd also really benefit from this.

chadrik commented 7 years ago

Between this and #995, I think I currently prefer the latter, because A) the client may want to do something other than block and B) it could be used as the basis for implementing blocking on the client-side.

Curious what others think.

marksteward commented 7 years ago

995 leaves it to the publisher to decide when to retry. If a publisher doesn't know when to throttle messages, can it be expected to be able to do the opposite?

(To give some background, I imagined this behaviour as alarms but more granular. It would be useful in situations where a badly-behaved app is at risk of filling queues suddenly, and has no other back off mechanism. Not something that should happen often, but ideally wouldn't require additional coding when it does.)

ben-spiller commented 6 years ago

+1 for this. It's a big limitation that when queue limit is reached the only available options involve throwing messages away. Every other JMS provider I've seen either implements publisher blocking or throws an exception from send() to let the publisher know the send didn't succeed - either of those would be fine, but having the send() complete successfully but throw away messages makes it very hard to build a reliable application. It's an even bigger issue with the JMS API as some of the Rabbit-specific confirmation API options aren't available.

lukebakken commented 6 years ago

@ben-spiller

It's a big limitation that when queue limit is reached the only available options involve throwing messages away

Further up in the discussion you find a link to #995, which shipped in 3.7.0. This feature gives you the option of having nacks returned when a queue's limit is reached (docs).

ben-spiller commented 6 years ago

@lukebakken Thanks for the follow-up. We're using RabbitMQ using the JMS API (which isn't mentioned in that sections of the docs). Is there any way for the publisher to get an indication that a JMS send has failed in this case (i.e. an exception from the JMS send or commit method)?

lukebakken commented 6 years ago

@ben-spiller - have you tried doing a JMS send to a full queue while using the reject-publish option? RabbitMQ will send a basic.nack back to the JMS client.

@acogoluegnes would probably know off the top of his head what happens 😄

michaelklishin commented 6 years ago

I doubt that blocking publishers in this case is a good idea. For starters, this behavior would be counterintuitive to most, in particular beginners. So it has to be opt-in.

Empirical evidence suggests that most applications are not designed to handle such scenarios. Asking publishers to account for nacks is a better option because they kind of have to anyway if they use publisher confirms at all.

JMS client can potentially use publisher confirms under the hood. Even if that's not the case, it is possible to rely on dead-lettering to collect and retry the messages that "did not fit". This trick has been used together with timed expiration for years and possible with any client as long as queue arguments can be tweaked (e.g. using a policy).

So I'd rather close this and see if there's enough interest in this feature going forward, now that #995 has shipped.

radiantone commented 3 years ago

Has this been implemented? Its very useful