Open plied opened 3 months ago
Hi @plied ,
Just came across this issue.
Have you tried ZMQ_AFFINITY
with zmq_ctx_set()
and play with ZMQ_THREAD_SCHED_POLICY
and ZMQ_THREAD_PRIORITY
? There are many options there to bind a socket to specific IO thread with defined priority.
Btw, zmq_init()
is deprecated by zmq_ctx_new()
.
None of the above seem to make a big difference, I tried the following:
ZMQ_THREAD_SCHED_POLICY=SHED_FIFO
ZMQ_THREAD_PRIORITY=99
ZMQ_AFFINITY=1
It seems like the ZMQ_THREAD_SCHED_POLICY
does improve the latency but minimally, but it still does not get anywhere close to the expected latency.
Came across a similar issue which leads me to this issue. Could it be due to the processes are put to slept by the CPU scheduler and hence need to be re-scheduled for execution before the messages can be processed?
This can affect both the local and remote processes IMO.
One way to validate this hypothesis is to give the process high priority and use a real-time kernel.
Issue description
While building a ultra low latency application using ZeroMQ through IPC I noticed that even though the benchmark run with
perf/local_lat
andperf/remote_lat
is able to achieve sub 50us latencies within a same host, these results are not replicable in production. After hours of research I found out that if there is any timeout in between the calls to thezmq_sendmsg
method the following message will be sent with a huge latency (>200us).This means if the application is actually sending messages back to back it will behave well, however if there are breaks in between messages (which is a more realistic use case) the latency of the sent message increases dramatically.
This issue was partially identified in issues #3577 and #3560 but they didn't come up with a right way to reproduce and thus it wasn't solved.
I wonder if this could be caused by the sender thread losing priority if the
zmq_sendmsg
function is not called after some time? NOTE that I'm calling the deprecatedzmq_sendmsg
method just because the originalperf/remote_lat
also calls that.I was able to replicate this same issue in the Python client, Rust Client and C++ itself.
Environment
Minimal test code / Steps to reproduce the issue
perf/remote_lat.cpp
file so that it measures the roundtrip time of each message sent independently, and adds a delay after each roundtrip is finished and measured:include "../include/zmq.h"
include
include
include
include // Added this
include // Added this
int main (int argc, char argv[]) { const char connect_to; int roundtrip_count; size_t message_size; void ctx; void s; int rc; int i; zmq_msg_t msg; void *watch; unsigned long elapsed = 0; //Added this double latency;
}
and
What's the actual result? (include assertion message & call stack if applicable)
We keep getting very high latencies:
What's the expected result?
The original code which is exactly the same just without sleeping in between messages resulted in average latencies of below 50us every single time. I would expect latencies to behave similarly no matter if we are sending messages often or not, else the entire latency benchmark is missleading and useless.