zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.71k stars 2.35k forks source link

I/O threads are not scheduled correctly in case of isolated CPUs on Linux #4457

Open sarkanyi opened 1 year ago

sarkanyi commented 1 year ago

Issue description

Though it's mainly a problem if using isolated CPUs, I could somewhat reproduce in other environments too (that is what the sleep is for when creating connections).

Even if one sets up zmq_setsockopt(input_socket, ZMQ_AFFINITY, &affinity, sizeof(uint64_t)); and adds the CPUs that need to be added to ZMQ_THREAD_AFFINITY_CPU_ADD, they will all run on the same core.

Environment

Minimal test code / Steps to reproduce the issue

`#include

include

include

include

int main (void) { char services[] = {"tcp://0.0.0.0:5554", "tcp://0.0.0.0:5555", "tcp://0.0.0.0:5556", "tcp://0.0.0.0:5557"}; void context = zmq_ctx_new ();

zmq_ctx_set(context, ZMQ_IO_THREADS, 4);
zmq_ctx_set(context, ZMQ_THREAD_AFFINITY_CPU_ADD, 2);
zmq_ctx_set(context, ZMQ_THREAD_AFFINITY_CPU_ADD, 3);
zmq_ctx_set(context, ZMQ_THREAD_AFFINITY_CPU_ADD, 4);
zmq_ctx_set(context, ZMQ_THREAD_AFFINITY_CPU_ADD, 5);
zmq_ctx_set(context, ZMQ_THREAD_AFFINITY_CPU_ADD, 6);

void *socket = zmq_socket (context, ZMQ_PULL);

for (int i = 0; i < 4; i++) {
sleep(1);
    uint64_t affinity = 1 << i;
    zmq_setsockopt(socket, ZMQ_AFFINITY, &affinity, sizeof(uint64_t));
    zmq_connect (socket, services[i]);
}

while (1) {
    char buffer [10];
    zmq_recv (socket, buffer, 10, 0);
    sleep (1);          //  Do some 'work'
}
return 0;

}`

What's the actual result?

All ZMQbg/IO threads run on the first core from the set.

What's the expected result?

The ZMQbq/IO threads are evenly distributed. One normal systems at some point the scheduler will shuffles them around, however in case of isolated cores the scheduler doesn't works. Also using a realtime scheduler often is not an option as it causes unwanted interrupts in the system.

I have an idea to fix this properly, but for that the internal thread creation should be distributed in a round robin fashion on the affine CPUs (can be an option). For this an atomic internal thread counter has to be passed from the context into the I/O threads and then set the affinity something like:

unsigned long i = 0; for (std::set<int>::const_iterator it = _thread_affinity_cpus.begin (), end = _thread_affinity_cpus.end (); it != end; it++) { if (_thread_count % _thread_affinity_cpus.size() == i) { CPU_SET ((int) (*it), &cpuset); break; } i++; }

In theory I could submit a PR, but only if there is a realistic chance of it landing eventually.

f18m commented 1 year ago

hi @sarkanyi , I'm not a zmq core developer but, just like you, I'm deploying ZMQ-based applications on isolated cores (isolcpu boot option)... and I'm getting the ZMQ background threads pinned to the CPU I wnt... so I'm curious to see if I can reproduce the problem with your example if I find the time.

However in my case at some point in the past I stopped relying on the pthread_setaffinity_np() syscall done internally by ZMQ and developed another in-application method that, by scanning /proc//task filesystem finds the PID/TID of each background thread and overrides with sched_setaffinity() its affinity value. The reason for that is that in my case there are other 3rd party libs in my app that do not expose a functionality similar to what ZMQ offers... I don't remember switching to this method because of any issue in ZMQ itself but I might be wrong (was long time ago)