zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.64k stars 2.35k forks source link

Be more specific about when DONTWAIT would return EAGAIN in multipart messages #3810

Open skrap opened 4 years ago

skrap commented 4 years ago

Issue description

In writing rust zmq bindings for tokio, the question came up about when zmq_send and zmq_recv might return EAGAIN when passed the ZMQ_DONTWAIT flag in a multipart message. The implication of the high water mark documentation is that it will only return EAGAIN on the first part of the multipart message, but I couldn't find it explicitly stated anywhere that this was the case.

I did find this mailing list exchange but it is 9+ years old.

On Thu, Apr 28, 2011 at 3:12 AM, MinRK wrote:

When sending or receiving multipart messages with NOBLOCK, is it possible for zmq_send or zmq_recv to raise EAGAIN on any part other than the first?

In principle, no. If this happens it's a bug. -Pieter

I also found #2269, which perhaps never came to a firm conclusion. In that issue, it was posited that:

So in summary, this would be the semantics I'd assume:

  • Upon receive, all parts of a multi-part message can be received without blocking.

  • Upon sending, encountering EAGAIN on a message part can happen, but it means the whole multipart message needs to be retried.

Is this a valid conclusion? If so, I'd try to prepare a documentation patch.

Is this still the design? If so, I'd be happy to post a pull request adding this to the docs.

Environment

sigiesec commented 4 years ago

Yes, this should be documented more clearly.

However, the current implementation details are one thing, and API guarantees are another. Even if the current implementation can only cause an EAGAIN error on the first message part, we might not want to guarantee this on the API, since it might change for internal reasons. If the want to guarantee it on the API, we need to be absolutely sure it will never need to change.

skrap commented 4 years ago

Yep, I totally understand the caution about locking in an API's semantics. However, the library is already returning EAGAIN to clients, so I feel there's a good case to be made that the docs should say more about what an application should infer from that.

From the perspective of an application, we just need to know what EAGAIN means we should to do to make progress. It seems like there's two different meanings which have been called EAGAIN:

In both cases, the app needs to know whether it should send the multipart again from the first message part, or whether it should continue from the part it was sending when it received EAGAIN. Perhaps in both cases above it means start over from the beginning of the multipart?

Right now, it seems that the best an application can do in the face of EAGAIN in the middle of a multipart is to close the socket and start with a new one, as we don't know what its internal state might be. Even if this is the best guarantee that libzmq wants to make, it would be a positive change to mention that in the documentation: "if EAGAIN is returned when sending a non-first part of a multipart message, the state of the socket is undefined, and the socket should be closed without further use."

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

skrap commented 3 years ago

I would like to re-raise my suggestion in https://github.com/zeromq/libzmq/issues/3810#issuecomment-583424935 that it would be helpful to at least document the weakest guarantee which could be made, which is that a send returning EAGAIN on a non-first message leaves the socket in an undefined state where the only safe action is closing the socket.

Then, if it is discovered that a stronger guarantee could be made in the future, the guarantees can always be strengthened without risking the soundness of code relying on them.

Would you accept a pull request for such a change? I'd be happy to put one together.