Closed vgkiziakis closed 6 years ago
There is one important issue with blocking: we can only block the entire connection. RabbitMQ cannot know where the next message will be routed to.
So are you suggesting that we use the same mechanism as with resource-driven alarms (stop reading from connection socket) or use a protocol extension to send publishers notifications (similarly to how we notify them about connections being blocked due to resource alarms)?
How would this be configured for queues? How should this work for connections on remote nodes (any client can publish to any exchange, routing to any queue, from any node)?
In this case the publisher would effectively be declaring that they are happy for the entire connection to be blocked if any queue that is bound to any exchange they send a message to hits its length limit.
If this is hard/impossible to track perhaps your suggestion for a protocol extension could work. Perhaps this could work similarly to the mandatory or immediate flags. A new type of message flag could indicate that the message should be returned to the sender if it ends up being routed to any queue that has hit its length limit. Is this feasible ?
We already have protocol extensions: connection.[un]blocked
. We need a publisher and/or queue to indicate that it should be blocked. Perhaps this can be a queue setting.
Existing methods such as basic.publish
cannot be extended, unfortunately. We can introduce new methods as needed. Every client also has an extensible table of "capabilities" which server can check and act accordingly.
So, if I've understood this correctly, it could work in the following way:
Both the publisher and the queue should indicate they want to be blocked. I think it's somewhat dangerous to only require this to be a queue setting because it would allow consumers to declare tiny blocking queues and effectively block unsuspecting publishers.
Do you think this makes sense ?
@vgkiziakis it definitely does, except that publishers will still use the regular basic.publish
method. Smaller details of this need to be worked out but you got the general idea right.
Can it be implemented using mandatory
flag and proper response code in return
message?
mandatory
has a different semantics, so does basic.return
.
Message from publisher can go to multiple queues and some of them can be full and support blocking. In that case publisher will still receive connection.blocked
and be blocked.
Then publisher can retry sending the message, and it will be routed to "free" queues again while blocked by same or other "busy" queue.
To be able to avoid it we should know that all queues are able to receive message, but it will require some kind of multi-queue transactions and can be too hard.
So using this feature will require understanding that messages can be enqueued multiple times in some routing scenarios.
When queue is blocking publisher it should unblock it at some point. Doing it right after the first message was consumed can cause multiple channels to start republishing again and be blocked. Some kind of additional parameters can be provided to the blocking argument. Like free space in the queue on which it will unblock publishers. It can be relative to queue max_length or absolute number of messages (of bytes). Another concern is requeueing messages. Default overflow strategy is to drop messages and they are dropped on requeue as well. If we introduce non-dropping strategy requeues will overflow queue since they cannot be blocked. Possible solution is to not unblock publishers until there are place for queued messages + unacked messages count.
Idea to refactor blocking states in rabbit_reader
Currently there are running
blocking
and blocked
states and throttle
field, which contain state information. So transitions indirectly depend on this state information.
So proposal is to change this state machine: https://github.com/hairyhum/pics/blob/master/old.png
Into this: https://github.com/hairyhum/pics/blob/master/new.png
So there will be more sane transitions which will not depend so much on throttle
values during transition.
@hairyhum what's the value proposition of this new state machine? It can be transition and state names that confuse me but I don't find it to be any simpler than what we have.
@michaelklishin currently state are divided between connection_status
and throttle
and desigions are made based on throttle
.
Also connection could already receive publishes, but stil will be set to blocking
on next round (i don't know, maybe it's by design)
Desision to send connection.blocked
is made by checking alerts in old and new throttle
, as well as desision to send connection.unblocked
. So state when connection was blocked by flow, or alerts with sending method to client os not explicit.
Basically this new machine behave same way as old one, but I tried to take conditional logic and indirect state information from inide transition and throttle
to state machine.
@hairyhum there are two separate state machines: a negotiation/running/closing/closed one and another for throttling. If we were to combine them into one, we'd need to come up with descriptive names for such states. This is not going to be easy, and can spill into the management UI. Are you sure the whole thing is worth doing? What does it really improve for this particular issue?
I think we should stop discussing this refactoring idea in this issue as there seems to be little connection. Please submit a PR for what you have, and include the state diagrams above.
@michaelklishin it was related to this issue, because I've tried to send connection.[un]blocking
from channel because of message overflow. And I could not, because blocking state machine is too implicit.
Right now it's gone too far, I agree. Will move to another PR.
I'd love to see this feature. @hairyhum did your state machine refactor pave the way for this change?
I've been hitting some issues that'd also really benefit from this.
Between this and #995, I think I currently prefer the latter, because A) the client may want to do something other than block and B) it could be used as the basis for implementing blocking on the client-side.
Curious what others think.
(To give some background, I imagined this behaviour as alarms but more granular. It would be useful in situations where a badly-behaved app is at risk of filling queues suddenly, and has no other back off mechanism. Not something that should happen often, but ideally wouldn't require additional coding when it does.)
+1 for this. It's a big limitation that when queue limit is reached the only available options involve throwing messages away. Every other JMS provider I've seen either implements publisher blocking or throws an exception from send() to let the publisher know the send didn't succeed - either of those would be fine, but having the send() complete successfully but throw away messages makes it very hard to build a reliable application. It's an even bigger issue with the JMS API as some of the Rabbit-specific confirmation API options aren't available.
@ben-spiller
It's a big limitation that when queue limit is reached the only available options involve throwing messages away
Further up in the discussion you find a link to #995, which shipped in 3.7.0
. This feature gives you the option of having nack
s returned when a queue's limit is reached (docs).
@lukebakken Thanks for the follow-up. We're using RabbitMQ using the JMS API (which isn't mentioned in that sections of the docs). Is there any way for the publisher to get an indication that a JMS send has failed in this case (i.e. an exception from the JMS send or commit method)?
@ben-spiller - have you tried doing a JMS send to a full queue while using the reject-publish
option? RabbitMQ will send a basic.nack
back to the JMS client.
@acogoluegnes would probably know off the top of his head what happens 😄
I doubt that blocking publishers in this case is a good idea. For starters, this behavior would be counterintuitive to most, in particular beginners. So it has to be opt-in.
Empirical evidence suggests that most applications are not designed to handle such scenarios. Asking publishers to account for nacks is a better option because they kind of have to anyway if they use publisher confirms at all.
JMS client can potentially use publisher confirms under the hood. Even if that's not the case, it is possible to rely on dead-lettering to collect and retry the messages that "did not fit". This trick has been used together with timed expiration for years and possible with any client as long as queue arguments can be tweaked (e.g. using a policy).
So I'd rather close this and see if there's enough interest in this feature going forward, now that #995 has shipped.
Has this been implemented? Its very useful
Currently if a queue has a length limit configured
In some cases (where publishers want to know that the messages they send are never discarded) it may be useful to block publishers that send messages to exchanges that have bound queues which have reached their length limit rather than discarding older messages in those queues. The publishers would only be able to send messages again once all queues bound to the target exchange have room.
The above was briefly discussed last Wednesday in our meeting with @tsaleh and @jerenkrantz at the Pivotal offices in London.