thkukuk / ypbind-mt

Multithreaded daemon maintaining the NIS binding informations.
GNU General Public License v2.0
4 stars 3 forks source link

Rebind interval #5

Closed digitalcabbage closed 2 years ago

digitalcabbage commented 2 years ago

We are upgrading our HPC from CentOS7 to Alma Linux. For a large number of reasons we are still using NIS, though authentication is via Kerberos from the universities AD and the NIS is only on the HPC's internal network so the security issues with NIS are moot.

We have two NIS servers nis1 the master and nis2 the slave. By changing the ping interval and rebind interval (setting them to 2 and 3 respectively) it has been possible for many years now to apply security patches and reboot the two NIS servers at will without impacting the users.

With Alma Linux the option to change the rebind interval has been removed from ypbind. This has happened upstream so it's nothing to do with RedHat. However examining the change logs here on GitHub I can find not mention of it's removal or the reason for it's removal. Some testing shows that the fail over is back to taking 900s (aka 15 minutes) which is completely impractical.

The only thing I can think of is that the latest version by supporting v3 of the NIS protocol does not need hacks to lower the rebind interval to make failover happen in a timely manner? If this is the case I need to upgrade the NIS servers to Alma quickly. If that won't help then the removal of the option to configure the rebind interval is a major regression in our view.

thkukuk commented 2 years ago

Looks like you are mixing up some things. The rebind interval checked if there is a faster NIS server, not if the current ones is working or not. This option did depend on on special features of the SunRPC implementation of glibc and thus was never part of ypbind-mt 2.x, which uses libtirpc to support IPv6. The ping interval pro-active checks every 20 seconds if the NIS server is still alive. If there is a NIS query in between and the NIS server failed during that time, ypbind will retrigger the search for a new ypserv automatically. There should be no impact for the users and for the networks I was admin for or which I'm using with NIS I never saw that.

So if you see a 900 seconds timeout, you screwed up your optimizations and configuration. There is nowhere a 900 second timeout in the code, that longest period are 20 seconds.

jabuzz commented 5 months ago

Right I am going to come back to this because something is badly wrong with ypbind-mt and I need to migrate my CentOS 7 master NIS server to something newer and without working failover that is not going to be pretty. The slave server has been migrated over to Alma8 fine.

I basically cannot make a RHEL8 or Ubuntu 24.04 LTS client fail over to use the slave NIS server. Not sure where I got the 900 seconds from previously I was clearly on the crackpipe as it doesn't fail over. If I block the client on the master NIS server with firewall-cmd --zone=drop --add-source=x.x.x.x to simulate the server being unavailable then on a CentOS7 client with old fashioned ypbind client then failover to the Alma8 based NIS server works.

Do the same on a Alma8 or Ubunto 24.04 client and you basically get

root@ububtu-nis:~# ypcat passwd
yp_bind_client_create_v3: RPC: Timed out
yp_bind_client_create_v3: RPC: Timed out
No such map passwd.byname. Reason: Can't bind to server which serves this domain

Note if I start the Ubuntu client blocked on the master server then unblock it and block it on the slave server it will fail back to the master server. However it simply won't fail over from the master server. I have tried making the client IPv4 only by passing ipv6.disable=1 to the kernel at boot but it made no difference.

Basically a stock install of 24.04 with the following configuration for NIS

root@ububtu-nis:~# more /etc/yp.conf
domain hpc server nis1.hpc.myuni.edu
ypserver nis2.hpc.myuni.edi
root@ububtu-nis:~# more /etc/defaultdomain 
hpc

To my mind something is clearly broken in ypbind-mt when it comes to failover to a slave NIS server when the master server is unavailable for what ever reason..