slsdetectorgroup / slsDetectorPackage

SLS Detector Package
Other
12 stars 8 forks source link

Configuration of ZMQ Keepalive #955

Closed felix-engelmann closed 2 weeks ago

felix-engelmann commented 2 weeks ago
*Detector type:

All

*Software Package Version:

developer

Priority:

Low

*State the feature:

Set the keepalive ZMQ socket options to send heartbeat messages over zmq.

Is your feature request related to a problem. Please describe:

In some networks, TCP flows get discarded if there is no packet for a longer period.

Describe the solution you'd like:

An additional configuration option, e.g. rx_zmqkeepalive which sets the additional socket options:

diff --git a/slsSupportLib/src/ZmqSocket.cpp b/slsSupportLib/src/ZmqSocket.cpp
index c4c024179..3f2936f4f 100644
--- a/slsSupportLib/src/ZmqSocket.cpp
+++ b/slsSupportLib/src/ZmqSocket.cpp
@@ -76,6 +76,28 @@ ZmqSocket::ZmqSocket(const uint32_t portnumber, const char *ethip)
     sockfd.serverAddress = oss.str();
     LOG(logDEBUG) << "zmq address: " << sockfd.serverAddress;

+    // Socket Options for keepalive in k8s container
+    int keepalive = 1;
+    if (zmq_setsockopt(sockfd.socketDescriptor, ZMQ_TCP_KEEPALIVE, &keepalive, sizeof(keepalive))) {
+        PrintError();
+        throw ZmqSocketError("Could set socket opt ZMQ_TCP_KEEPALIVE");
+    }
+    keepalive = 10;
+    if (zmq_setsockopt(sockfd.socketDescriptor, ZMQ_TCP_KEEPALIVE_CNT, &keepalive, sizeof(keepalive))) {
+        PrintError();
+        throw ZmqSocketError("Could set socket opt ZMQ_TCP_KEEPALIVE_CNT");
+    }
+    keepalive = 60;
+    if (zmq_setsockopt(sockfd.socketDescriptor, ZMQ_TCP_KEEPALIVE_IDLE, &keepalive, sizeof(keepalive))) {
+        PrintError();
+        throw ZmqSocketError("Could set socket opt ZMQ_TCP_KEEPALIVE_IDLE");
+    }
+    keepalive = 1;
+    if (zmq_setsockopt(sockfd.socketDescriptor, ZMQ_TCP_KEEPALIVE_INTVL, &keepalive, sizeof(keepalive))) {
+        PrintError();
+        throw ZmqSocketError("Could set socket opt ZMQ_TCP_KEEPALIVE_INTVL");
+    }
+
     // bind address
     if (zmq_bind(sockfd.socketDescriptor, sockfd.serverAddress.c_str())) {
         PrintError();

The actual values can be hard coded, as a network which drops flows after 1 minute is not usable. So far we have not noticed any performance impact.

Describe alternatives you've considered:

Always set the options, as they shouldn't interfere with anything existing.

Additional context:

This is an issue at Max IV when running in a docker container

thattil commented 2 weeks ago

We will try the alternative: have it as default