zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.71k stars 2.35k forks source link

Clarification on Subscriber high water mark #4469

Open nmccrea opened 1 year ago

nmccrea commented 1 year ago

Issue description

Hi there.

This library is awesome. Thanks everyone who has worked on it. I am actually trying to understand why it seems to NOT have a problem I was expecting it to have, so that I can be confident my implementation is sound.

I am using Pub/Sub to push a stream of messages to a set of subscribers. In our case, all subscribers must receive all messages in order, without drops or duplicates. This is not a situation where a protocol like "Suicidal Snail" is appropriate as all subscribers must succeed. Instead it's a perfect case for "flow control" on the publisher's side. I set the "no drop" option on the publisher so sending throws an error when a message fails to send due to a backlog in the message queue. When such an error is raised, the publisher simply retries until the message sends successfully.

Now I understand everything quite well when the Publisher's queue backs up and the SNDHWM is reached. What I am confused about is that this seems to work correctly even when the backlog is on the subscriber's side - i.e., when one of the subscribers' RCVHWM has been reached. Somehow, it appears the publisher is able to detect this backlog, and prevent sending the message even to the OTHER subscribers that have not fallen behind.

Example:

Assume all the connections have already been created correctly so there is no "slow joiner" issue.

Expected Behavior

What I assumed would happen was that the fast subscriber, B, would receive both attempts to send Message 2, and would have to detect and mitigate this:

Publisher's Log

sent message 1
sent message 2 (<-- First attempt. Expected to be received by Sub B but not by Sub A)
sent message 2 (<-- Retry)

Subscriber A's Log

received message 1
received message 2 (<-- This is the retried message. A never received the first attempt.)

Subscriber B's Log

received message 1
received message 2 (<-- First attempt. This message was not received by A)
received message 2 (<-- B receives the retry as well. Must detect that it has already received this message so that it can ignore it.)

Actual Behavior

Instead, it appears that the publisher is able to detect the backlog on Subscriber A's end before it attempts the send, so that it knows not to send to Subscriber B either. Thus, I am seeing the following happy output even on B's side:

received message 1
received message 2

Now this is exactly what I want. But it has thrown me off because it is not what I expected, and I can't find a clear explanation in the docs. I can easily think of ways that the publisher accomplishes this, but they all seem like they would require a great deal of "back chatter" from the subscribers and would thus destroy the performance benefits of the pub/sub pattern. So I would like to confirm that it does in fact work this way, and to get a better understanding of how ZeroMQ is able to achieve this.

Environment

This is a general question about the protocol but for what it's worth, I'm using zermoq.js v6.0.0 on the publisher side, and czmq 4.2.0 on the subscriber side. This is running on MacOS 13.0.1