zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.46k stars 2.34k forks source link

Fix zmq_proxy_steerable #4600

Closed nnog closed 9 months ago

nnog commented 9 months ago

Continuing Brett's work on zmq_proxy_steerable here: https://github.com/zeromq/libzmq/pull/4598

2 problems addressed (split into 2 commits):

  1. PAUSE/RESUME commands were seemingly mixed up and the test didn't cover it.
  2. The reimplementation changed behaviour by always replying for the simple commands PAUSE, RESUME, TERMINATE, so ZMQ_PAIR usage would have to change when migrating.

I agree that REP makes the most sense for the control socket type, but prior semantics of the control socket pretty much dictated the use ZMQ_PAIR, so this just restores backwards compatibility.

New test assertions with existing src/proxy.cpp:

steer: sending STATISTICS - post-pause                                             
stats: client pkts out: 15 worker pkts out: 32 { 30 345 64 736 64 736 30 345 }     
tests/test_proxy_steerable.cpp:465:test_proxy_steerable:FAIL: Expected 575 Was 1081
bluca commented 9 months ago
[  113s] tests/test_proxy_steerable.cpp: In function 'void test_proxy_steerable()':
[  113s] tests/test_proxy_steerable.cpp:461:23: error: unused variable 'bytes' [-Werror=unused-variable]
[  113s]      unsigned long int bytes = statistics (proxy_control, "post-pause");
nnog commented 9 months ago

bytes is referenced in test macros on lines 465, 472. How are you building/where are you seeing this? I can't find this in any CI logs. Thanks

bluca commented 9 months ago

bytes is referenced in test macros on lines 465, 472. How are you building/where are you seeing this? I can't find this in any CI logs. Thanks

It's in any of the failed OBS CI jobs listed below, they are all 32bit builds

bluca commented 9 months ago

please rebase and squash the fixup commit

emusgrave commented 9 months ago

For posterity, when the previous implementation used a sub socket for the control, it made it easy to publish one TERMINATE that would control many listeners. I have code that spawns worker threads which are subscribed to that same socket address and therefore can safely close down with the proxy. I believe this was from one of the original examples of the usage of the proxy.

emusgrave commented 9 months ago

I was just able to test this PR and things are looking good again for using a sub socket for the TERMINATE control. Thanks @nnog and @bluca. I was banging my head against the wall for a bit when I had upgraded to 4.3.4 and my proxies stopped working.

nnog commented 9 months ago

I was just able to test this PR and things are looking good again for using a sub socket for the TERMINATE control. Thanks @nnog and @bluca. I was banging my head against the wall for a bit when I had upgraded to 4.3.4 and my proxies stopped working.

Yes, SUB control socket will work now as long as you only send non-replying commands. It will still encounter ENOTSUP and exit the proxy call with -1 if you do send STATISTICS. Which from a developer's perspective is probably better than silently dropping the command.

bluca commented 9 months ago

The test is flacky and fails on slow architectures:

[ 2361s] FAIL: tests/test_proxy_steerable [ 2361s] ================================ [ 2361s] [ 2361s] tests/test_proxy_steerable.cpp:458:test_proxy_steerable:FAIL: Expression Evaluated To FALSE

https://build.opensuse.org/build/network:messaging:zeromq:git-stable/Fedora_Rawhide/s390x/libzmq/_log

swelborn commented 5 months ago

@bluca @nnog is there a release timeline for this? In 4.3.5, can you use PUB/SUB sockets? thanks for your work on it.