sipwise / rtpengine

The Sipwise media proxy for Kamailio
GNU General Public License v3.0
794 stars 372 forks source link

Re-resolve Redis ip-address when RE-Establishing connection for Redis server #1865

Open NikolayShakin opened 1 month ago

NikolayShakin commented 1 month ago

Is your feature request related to a problem? Please describe

When using rtpengine with a redis cluster after the redis master node has changed, the rtpengine loses redis connection even when FQDN is used as the redis server address and DNS record was updated.

Describe the solution you'd like

When we try to re-establish connection to redis we can re-resolve IP address after every N failed tries to connect. It will not be too expensive as we lost the connection anyway. Assuming that DNS record was updated when the redis master node changed, it will allow rtpengine automatically switch to a new redis master node

Describe alternatives you've considered

Restart rtpengine after redis master node change

The rtpengine version you checked that didn't have the feature you are asking for

Version: 12.5.1.5-1~bpo11+1

zenichev commented 1 day ago

@NikolayShakin it's quite important that the system (on which rtpengine is running), is supposed to have an actual DNS record for the concerned FQDN asap. Which can be sometimes cumbersome to update in-time, when a switchover takes just a few seconds. Also it gets not so easy when NAPTR/SRV records are used for the record. So it will be useless to force rtpengine to re-resolve the FQDN, if the record is still the same during the switchover/failover of redis master. Hence the must is that the system provides an actual record just in-time.

However, I think it should be feasible to add a resolve of the FQDN, each time when rtpengine gets re-connected to redis. I will give a look in coming weeks.

guss77 commented 1 day ago

@zenichev I'm a bit confused about your statement - can you please expand on the difference between "force rtpengine to re-resolve the FQDN" and "the system provides an actual record just in-time"?

The way I understand the situation, the libnss hosts driver (or another mechanism that uses the local system configuration in /etc/nsswitch.conf or /etc/resolv.conf directlry) is quarried for the IP address for the host defined in the configuration. The problem the OP describes (which is very similar to the issue I have) is that first the DNS records - and hence the local system configuration (not the RTPEngine configuration file) has been updated with a new IP address for the existing host name, and only then the older server - whose IP address RTPEngine has used to connect is dropped.

The request is that when the redis connection drops, only then is RTPEngine expected to refresh its cache of the DNS results - under the assumption that the rest of the system is working correctly.

zenichev commented 1 day ago

@guss77

can you please expand on the difference between "force rtpengine to re-resolve the FQDN" and "the system provides an actual record just in-time"?

Maybe wasn't too much clear, but important is to have a correctly behaving system, where rtpengine is running, in regards of host names resolution.

The problem the OP describes (which is very similar to the issue I have) is that first the DNS records - and hence the local system configuration (not the RTPEngine configuration file) has been updated with a new IP address for the existing host name

This wasn't clear from the original request. Updated where ? on the DNS server, or on the system where rtpengine is running? As I said, we should assume that system correctly and in-time has actual resolution for the requested hostname. Then we can act in rtpengine, and upon loosing connection to the redis server, try to make a resolve.

It smells like that should be configurable option. I will try to put my hands when I find free time coming weeks. Not promising anything.

guss77 commented 1 day ago

Maybe wasn't too much clear, but important is to have a correctly behaving system, where rtpengine is running, in regards of host names resolution.

Agreed.

This wasn't clear from the original request. Updated where ?

The OP said "[…] and DNS record was updated", which I read to map well into my situation: the DNS configuration on the local system can resolve the same FQDN to the new IP address, immediately when it makes sense to see the change.

It smells like that should be configurable option.

I disagree - it should be the default and only behavior: do not cache the DNS result and when a new connection needs to be opened - run the resolver again. I don't think there is even a performance consideration here: it is up to the system administrator to make sure that the DNS lookup is prompt (and there are many many ways to do so), and if it isn't - its not RTPEngine's problem.

NikolayShakin commented 1 day ago

The thing is, even when the system that runs rtpengine can resolve the IP address correctly, rtpengine doesn't try to do so when it fails to connect to Redis(to the old incorrect address), it keeps trying to connect the IP address it resolved when started.