zeromq / czmq

High-level C binding for ØMQ
czmq.zeromq.org
Mozilla Public License 2.0
1.18k stars 525 forks source link

Lingering broken TCP connections when using zproxy #2244

Closed jonasdn closed 1 year ago

jonasdn commented 1 year ago

Hello!

We are using czmq to create a broker on a Linux server that handles quite a few devices over quite long periods of time. We set up a "pubsub" proxy to handle some requests from the devices as such:

zactor_t *
create_pubsub_proxy (int in_port, int out_port) {
    zactor_t *proxy=NULL;

    proxy = zactor_new (zproxy, NULL); assert (proxy);

    // Set the frontend (incoming messages)
    zstr_sendm (proxy, "FRONTEND");
    zstr_sendm (proxy, "XSUB");
    zstr_sendf(proxy, "tcp://*:%d", in_port);
    zsock_wait (proxy);

    // Set backend, (outgoing messages)
    zstr_sendm (proxy, "BACKEND");
    zstr_sendm (proxy, "XPUB");
    zstr_sendf(proxy, "tcp://*:%d", out_port);
    zsock_wait (proxy);

    return proxy;
}

We notice that the resource allocation (memory) of the broker is constantly growing and it tracks with the amount of open file descriptors. I can reproduce (on GitHub master of libczmq) lingering TCP file descriptors by issuing the following command on a device:

$ (ip link set eth0 down; sleep 3; reboot) &

I have looked for similar issues and found https://github.com/zeromq/libzmq/issues/1453 which talks about adding TCP_KEEPALIVE to fix this issue. But there is no way for me to set keepalive on the proxy sockets.

When I added the following diff to libczmq code:

diff --git a/src/zproxy.c b/src/zproxy.c
index 90dd5a8f..fc0fd8d6 100644
--- a/src/zproxy.c
+++ b/src/zproxy.c
@@ -105,6 +105,11 @@ s_self_create_socket (self_t *self, char *type_name, char *endpoints, proxy_sock
     }
     zsock_t *sock = zsock_new (index);
     if (sock) {
+        printf("setting sockopts!\n");
+        zsock_set_tcp_keepalive(sock, 1);
+        zsock_set_tcp_keepalive_idle(sock, 30);
+        zsock_set_tcp_keepalive_intvl(sock, 3);
+        zsock_set_tcp_keepalive_cnt(sock, 10);
 #if (ZMQ_VERSION_MAJOR == 4)
         if (self->domain [selected_socket]) {
             // Apply authentication domain

Then the lingering socket went a way after ~2 minutes.

Is there something else I can tweak? Or should I try to add a nicer way to add keepalive to zproxy sockets?