RabbitMQ: 确认机制 - Githubissues

发送端如何确认交换机收到了消息

Networks can fail in less-than-obvious ways and detecting some failures takes time. Therefore a client that's written a protocol frame or a set of frames (e.g. a published message) to its socket cannot assume that the message has reached the server and was successfully processed. It could have been lost along the way or its delivery can be significantly delayed.

Using standard AMQP 0-9-1, the only way to guarantee that a message isn't lost is by using transactions -- make the channel transactional then for each message or set of messages publish, commit. In this case, transactions are unnecessarily heavyweight and decrease throughput by a factor of 250. To remedy this, a confirmation mechanism was introduced. It mimics the consumer acknowledgements mechanism already present in the protocol.

amqp协议中给出的解决方案是事务, 但是使用事务会造成250倍的性能差距, 所以rabbitmq使用了一个类似tcp的确认机制

To enable confirms, a client sends the confirm.select method. Depending on whether no-wait was set or not, the broker may respond with a confirm.select-ok. Once the confirm.select method is used on a channel, it is said to be in confirm mode. A transactional channel cannot be put into confirm mode and once a channel is in confirm mode, it cannot be made transactional.

首先在channelz中开启确认机制

Once a channel is in confirm mode, both the broker and the client count messages (counting starts at 1 on the first confirm.select). The broker then confirms messages as it handles them by sending a basic.ack on the same channel. The delivery-tag field contains the sequence number of the confirmed message. The broker may also set the multiple field in basic.ack to indicate that all messages up to and including the one with the sequence number have been handled.

客户端和服务端会对消息进行编号, 然后服务端会通知到客户端当前确认到第N个消息

Negative Acknowledgments for Publishes

In exceptional cases when the broker is unable to handle messages successfully, instead of a basic.ack, the broker will send a basic.nack. In this context, fields of the basic.nack have the same meaning as the corresponding ones in basic.ack and the requeue field should be ignored. By nack'ing one or more messages, the broker indicates that it was unable to process the messages and refuses responsibility for them; at that point, the client may choose to re-publish the messages.

After a channel is put into confirm mode, all subsequently published messages will be confirmed or nack'd once. No guarantees are made as to how soon a message is confirmed. No message will be both confirmed and nack'd.

basic.nack will only be delivered if an internal error occurs in the Erlang process responsible for a queue.

当然如果服务端无法处理消息, 那么会返回一个nack

什么情况下会发送ack

For unroutable messages, the broker will issue a confirm once the exchange verifies a message won't route to any queue (returns an empty list of queues). If the message is also published as mandatory, the basic.return is sent to the client before basic.ack. The same is true for negative acknowledgements (basic.nack).

For routable messages, the basic.ack is sent when a message has been accepted by all the queues. For persistent messages routed to durable queues, this means persisting to disk. For mirrored queues, this means that all mirrors have accepted the message.

可路由的消息
如果需要持久化, 那么成功保存到磁盘的

消费端确认

When a node delivers a message to a consumer, it has to decide whether the message should be considered handled (or at least received) by the consumer. Since multiple things (client connections, consumer apps, and so on) can fail, this decision is a data safety concern. Messaging protocols usually provide a confirmation mechanism that allows consumers to acknowledge deliveries to the node they are connected to. Whether the mechanism is used is decided at the time consumer subscribes.

Depending on the acknowledgement mode used, RabbitMQ can consider a message to be successfully delivered either immediately after it is sent out (written to a TCP socket) or when an explicit ("manual") client acknowledgement is received. Manually sent acknowledgements can be positive or negative and use one of the following protocol methods:

basic.ack is used for positive acknowledgements // 通知服务器删除这条消息
basic.nack is used for negative acknowledgements (note: this is a RabbitMQ extension to AMQP 0-9-1) // 通知服务器保留这条消息/删除这条消息
basic.reject is used for negative acknowledgements but has one limitation compared to basic.nack

How these methods are exposed in client library APIs will be discussed below.

Positive acknowledgements simply instruct RabbitMQ to record a message as delivered and can be discarded. Negative acknowledgements with basic.reject have the same effect. The difference is primarily in the semantics: positive acknowledgements assume a message was successfully processed while their negative counterpart suggests that a delivery wasn't processed but still should be deleted.

In automatic acknowledgement mode, a message is considered to be successfully delivered immediately after it is sent. This mode trades off higher throughput (as long as the consumers can keep up) for reduced safety of delivery and consumer processing. This mode is often referred to as "fire-and-forget". Unlike with manual acknowledgement model, if consumers's TCP connection or channel is closed before successful delivery, the message sent by the server will be lost. Therefore, automatic message acknowledgement should be considered unsafe and not suitable for all workloads.

如果你想要更高的性能, 可以使用自动确认模式, 这样服务器一旦把消息发送给消费者, 马上就会删除消息, 但是消息可能会丢失

Another thing that's important to consider when using automatic acknowledgement mode is consumer overload. Manual acknowledgement mode is typically used with a bounded channel prefetch which limits the number of outstanding ("in progress") deliveries on a channel. With automatic acknowledgements, however, there is no such limit by definition. Consumers therefore can be overwhelmed by the rate of deliveries, potentially accumulating a backlog in memory and running out of heap or getting their process terminated by the OS. Some client libraries will apply TCP back pressure (stop reading from the socket until the backlog of unprocessed deliveries drops beyond a certain limit). Automatic acknowledgement mode is therefore only recommended for consumers that can process deliveries efficiently and at a steady rate.

如果你使用自动确认, 那么客户端是没有办法做qos的, 这样的话可能会导致消费端被消息压垮

消费端QOS

Because messages are sent (pushed) to clients asynchronously, there is usually more than one message "in flight" on a channel at any given moment. In addition, manual acknowledgements from clients are also inherently asynchronous in nature. So there's a sliding window of delivery tags that are unacknowledged. Developers would often prefer to cap the size of this window to avoid the unbounded buffer problem on the consumer end. This is done by setting a "prefetch count" value using the basic.qos method. The value defines the max number of unacknowledged deliveries that are permitted on a channel. Once the number reaches the configured count, RabbitMQ will stop delivering more messages on the channel unless at least one of the outstanding ones is acknowledged.

For example, given that there are delivery tags 5, 6, 7, and 8 unacknowledged on channel Ch and channel Ch's prefetch count is set to 4, RabbitMQ will not push any more deliveries on Ch unless at least one of the outstanding deliveries is acknowledged. When an acknowledgement frame arrives on that channel with delivery_tag set to 5 (or 6, 7, or 8), RabbitMQ will notice and deliver one more message. Acknowledging multiple messages at once will make more than one message available for delivery.

It's worth reiterating that the flow of deliveries and manual client acknowledgements is entirely asynchronous. Therefore if the prefetch value is changed while there already are deliveries in flight, a natural race condition arises and there can temporarily be more than prefetch count unacknowledged messages on a channel.

Per-channel, Per-consumer and Global Prefetch

The QoS setting can be configured for a specific channel or a specific consumer. The Consumer Prefetch guide explains the effects of this scoping.

Prefetch and Polling Consumers

The QoS prefetch setting has no effect on messages fetched using the basic.get ("pull API"), even in manual confirmation mode.

Consumer Acknowledgement Modes, Prefetch and Throughput

Acknowledgement mode and QoS prefetch value have significant effect on consumer throughput. In general, increasing prefetch will improve the rate of message delivery to consumers. Automatic acknowledgement mode yields best possible rate of delivery. However, in both cases the number of delivered but not-yet-processed messages will also increase, thus increasing consumer RAM consumption.

Automatic acknowledgement mode or manual acknowledgement mode with unlimited prefetch should be used with care. Consumers that consume a lot of messages without acknowledging will lead to memory consumption growth on the node they are connected to. Finding a suitable prefetch value is a matter of trial and error and will vary from workload to workload. Values in the 100 through 300 range usually offer optimal throughput and do not run significant risk of overwhelming consumers. Higher values often run into the law of diminishing returns.

Prefetch value of 1 is the most conservative. It will significantly reduce throughput, in particular in environments where consumer connection latency is high. For many applications, a higher value would be appropriate and optimal.

lihongjie0209 / myblog

RabbitMQ: 确认机制 #220