zeromq / jeromq

JeroMQ is a pure Java implementation of the ZeroMQ messaging library, offering high-performance asynchronous messaging for distributed or concurrent applications.
https://zeromq.org
Mozilla Public License 2.0
2.36k stars 483 forks source link

How to terminate a congested TCP PAIR #515

Open gbonnefille opened 6 years ago

gbonnefille commented 6 years ago

In my context, I have to deliver multi-part messages using a TCP PAIR.

Everything goes right, until I have to release resources in a deteriorated situation: the peer receiving data, stopped to read, and it probably closed its side of the pair. In such context, the sender is possibly locked in a send. When I try to close the socket, depending on the situation, but the context refuse to end or many internal threads die weirdly. I did a minimal test to reproduce the situation: https://github.com/gbonnefille/jeromq/tree/pair-congestion The branch was created on 0.3.5 (the version I use) but was updated to 0.4.2 without significant change.

What is the right design to deal with such possible situation? What am I suppose to do to keep the control of the termination (could be named the abort here)? Furthermore, in my effective work, I use a ZLoop because I wish to monitor other sockets and reduce concurrent access. In this situation, the ZLoop is locked by the sender handler. And I feel I'm unable to use a non-blocking socket because I have a multi-part message, so I wish to send a whole message or nothing.

Any help would be really appreciated.

fredoboulo commented 6 years ago

Hi,

To close sockets, you may try to set the linger to 0 or another positive value, like

ZMQ.setSocketOption(sc, ZMQ.ZMQ_LINGER, 0); it may help, as some outbound messages would be pending to be sent to the remote pair before closing the local socket.

gbonnefille commented 6 years ago

Thanks @fredoboulo for the tip: it runs successfully on the unit test.

But as I already have linger set to 0 on my production code, I still have to find what's going wrong.

fredoboulo commented 6 years ago

Sorry @gbonnefille it looks like the test is not that representative of the error... If you find some test that fails, we can work from that; just remember that PAIR sockets are designed for use when the peers are architecturally stable

gbonnefille commented 6 years ago

Thanks @fredoboulo for bringing my attention for this aspect. I effectively probably misunderstood the pattern I can/should use. If you can offer me some expertise, I would be very grateful.

In my context, I have to:

And currently, we encounter difficulties when the application fall into degraded situations:

Do you think I should design my application in an other way?

fredoboulo commented 6 years ago

Slow consumer is quite a problem, have you looked some inspiration around Suicidal Snail Pattern ? This is related to PUB-SUB but it may be useful. Another possibility would be to increase the responsiveness of receivers, maybe by fanning out work (PUSH-PULL would come to my mind).

That would be difficult to give more precise advice without more details about the intentions of your code, apart from the fact that nothing you said gives arguments in favor of using PAIR socket.

gbonnefille commented 6 years ago

Concerning the Suicidal Snail Pattern, I have to provide a communication layer that do not loose messages. So I cannot design something that kill itself because it is to slow: it's not my code, but the user's one. The expected behaviour when consumer is too slow is to inform the sender to slowdown or stop.

You are certainly right concerning the usage of the PAIR socket, but I'm not familiar enough with ZeroMQ to find the correct design.

I fear that the most annoying part of my actual design, is the use of a single Zloop. When the handler responsible of transmitting message to the remote node is blocked in a send... the whole application is dead-locked, unable to react to new event, even the notification of the disconnection of the remote node.

fredoboulo commented 6 years ago

I'm not very aware of ZLoop, never use it (that explains the latest bug related to it... but I'm in self-pity mode when I say that).

Have you tried some PUSH/PULL pattern? PUSH blocks sending message when set HWM is reached, it basically handles backpressure (I make a big name dropping here, someone please correct me if this is wrong)

Have a look at the socket descriptions here, PUSH is not the only one to block when HWM is reached, DEALER, REQ and ... PAIR

daveyarwood commented 5 years ago

Blocked pending reply from @gbonnefille .