sipwise / rtpengine

The Sipwise media proxy for Kamailio
GNU General Public License v3.0
765 stars 362 forks source link

DTMF event sending does not recover from failure #1784

Closed gaaf closed 6 months ago

gaaf commented 6 months ago

rtpengine version the issue has been seen with

ngcp-rtpengine 11.4.1.3+0

Used distribution and its version

No response

Linux kernel version used

No response

CPU architecture issue was seen on (see uname -m)

x86_64

Expected behaviour you didn't see

Recovering from the error below.

Error sending DTMF event info to UDP destination x.x.x.x:pppp: Bad file descriptor

This keeps on repeating for each newly detected DTMF event. I don't know why the file descriptor got bad, but the odds of it becoming good again are very slim. The code only reports, but does not try to recover in any way:

if (udp_dst)
        if (socket_sendto(&dtmf_log_sock, buf->str, buf->len, udp_dst) < 0)
            ilog(LOG_ERR, "Error sending DTMF event info to UDP destination %s: %s",
                    endpoint_print_buf(udp_dst),
                    strerror(errno));

When an error occurs during sending the event, IMHO it would be beneficial to close the current socket and open a new one.

Unexpected behaviour you saw

No response

Steps to reproduce the problem

No response

Additional program output to the terminal or logs illustrating the issue

No response

Anything else?

No response

rfuchs commented 6 months ago

FIle descriptors don't just suddenly go bad (and they also don't suddenly become good again), and this is a connectionless UDP socket, so there's no reason why the file descriptor would get closed or become unusable. You need to find out what happened to it. Trying to work around it by just reopening it is not a solution.

gaaf commented 6 months ago

At least found the cause of the bad fs: ERR: [core] Failed to open/connect DTMF logging socket: Address family not supported by protocol

The dtmf_init() function failed. Unfortunately, none of the init functions in init_everything(), including this one, are checked for failures, so the startup continued.

I'm guessing the error code is because of IPv6 as I have no further issues with IPv4. This machine does not do IPv6. I'm pretty sure the DTMF socket functioned prior to the update form 11.0 to 11.4.

Suspect commit: 711d43646f0

Maybe it should not be attempted to create an IPv6 socket with an IPv4 address.

rfuchs commented 6 months ago

Maybe it should not be attempted to create an IPv6 socket with an IPv4 address.

No, this does work if the kernel supports IPv6 sockets, and is needed to be able to send to both IPv4 and IPv6 from the same socket.

gaaf commented 6 months ago

I fixed it with:

diff --git a/lib/socket.c b/lib/socket.c
index 4f0d1740..68d034ae 100644
--- a/lib/socket.c
+++ b/lib/socket.c
@@ -803,8 +803,14 @@ int open_v46_socket(socket_t *r, int type) {
        sockfamily_t *fam = &__socket_families[SF_IP6];

        if (__socket(r, type, fam)) {
-               __C_DBG("open socket fail, fd=%d", r->fd);
-               return -1;
+               __C_DBG("open v46 socket fail, fd=%d", r->fd);
+               if (errno == EAFNOSUPPORT) {
+                       fam = &__socket_families[SF_IP4];
+                       if (__socket(r, type, fam)) {
+                               __C_DBG("open v4 socket fail, fd=%d", r->fd);
+                               return -1;
+                       }
+               }
        }

        nonblock(r->fd);

Would you like a PR for that?

rfuchs commented 6 months ago

I've already got an (almost identical) patch pending internal review, but thanks.

gaaf commented 6 months ago

I'm still a bit concerned that the startup wasn't aborted even though the initialization failed. I'd rather have it not running at all than in some half-functioning state.