xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
358 stars 171 forks source link

makedns fails with unable to find an IP when using nics table. #6003

Open paddyoneill opened 5 years ago

paddyoneill commented 5 years ago

I am running a cluster that only uses DNS on the headnode for name resolution and not /etc/hosts. Following the documentation for defining additional nics using the nics table, I configured IP addresses for the infiniband network.

[root@nvme01 ~]# lsdef nvme02 --nics
Object name: nvme02
    nicaliases.ib0=nvme02-ib0
    nichostnamesuffixes.ib0=-ib0
    nicips.ib0=192.168.100.2
    nicnetworks.ib0=infiniband
    nictypes.ib0=infiniband

When I try to run makedns nvme02 to update the DNS configuration I get the following error.

Error: [nvme01]: Unable to find an IP for nvme02-ib0 in hosts table or via system lookup (i.e. /etc/hosts)

Since the cluster isn't using /etc/hosts, I have not added the alias to /etc/hosts. I have checked the hosts table and there is nothing additional configured for the alias there.

[root@nvme01 ~]# tabdump hosts
#node,ip,hostnames,otherinterfaces,comments,disable
"nvme02","192.168.0.2",,,,

From reading the documentation I got the impression that using otherinterfaces was deprecated in favour of using the nics table, but this doesn't seem to work with the makedns script. Am I mistaken in believing that makedns should work with entries in the nics table.

This is on the below OS and xCAT versions

[root@nvme01 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)
[root@nvme01 ~]# lsxcatd -a
Version 2.14.5 (git commit fc0fb3fca198aa298a114f6124749275e7d81f8c, built Thu Dec  6 22:20:43 EST 2018)
immarvin commented 5 years ago

hi @bybai , would you pls take a look at this issue? thx

hi @paddyoneill , as a workaround , you can add nvme02-ib0 in hosts table like:

#tabdump hosts
"nvme02-ib0","192.168.100.2",,,,

then try makedns nvme02

bybai commented 5 years ago

hi @paddyoneill, I think there are 2 problems you hit.

  1. The node nvme02 nics definition was confused. You can find my example as following.
    1. Before you makedns nvme02, you should execute command makehosts nvme02 first.

Here is my example: 1.

]# lsdef bybc0605 --nics
Object name: bybc0605
    nicips.ib0=10.20.100.9
    nicips.ib1=10.11.100.9
    nicnetworks.ib0=mgtnetwork
    nicnetworks.ib1=mgtnetwork
    nictypes.ib0=Infiniband
    nictypes.ib1=Infiniband

2.

]# makehosts bybc0605

]# cat /etc/hosts|grep bybc0605
10.5.106.5 bybc0605 bybc0605.cluster.com
10.20.100.9 bybc0605-ib0 bybc0605-ib0.cluster.com
10.11.100.9 bybc0605-ib1 bybc0605-ib1.cluster.com

]# makedns bybc0605
Handling bybc0605-ib0 in /etc/hosts.
Handling bybc0605 in /etc/hosts.
Handling bybc0605-ib1 in /etc/hosts.
Getting reverse zones, this may take several minutes for a large cluster.
Completed getting reverse zones.
Updating zones.
Completed updating zones.
Updating DNS records, this may take several minutes for a large cluster.
Completed updating DNS records.
DNS setup is completed
  1. 
    ]# nslookup bybc0605-ib0
    Server:     10.5.106.2
    Address:    10.5.106.2#53

Name: bybc0605-ib0.cluster.com Address: 10.20.100.9

immarvin commented 5 years ago

hi @bybai ,since @paddyoneill "only uses DNS on the headnode for name resolution and not /etc/hosts." If you modify the hosts line in /etc/nsswitch.conf to hosts: dns, makedns will fail even if the entries exists in /etc/hosts. Your test completed successfully because you are still using /etc/hosts to resolve the hostname

paddyoneill commented 5 years ago

Ignore the closing and opening, my mistake.

Thanks @bybai for the suggestions so far, I have changed the nics table to be similar to the example you provided.

As @immarvin mentioned, since this environment is setup to only use DNS and not the /etc/hosts file, makedns still fails to resolve the nvme02-ib0 hostname even after running makehosts first.

immarvin commented 5 years ago

did you try this:

as a workaround , you can add nvme02-ib0 in hosts table like:

#tabdump hosts
"nvme02-ib0","192.168.100.2",,,,

then try makedns nvme02

bybai commented 5 years ago

Hi @paddyoneill and @immarvin,

If you want to use hosts table but not /etc/hosts file, -ib0 should be configured in otherinterfaces, but it is not work well. So the following example can work around:

  1. create new node named bybc0605-ib0, this node is only for DNS
    
    ]# nslookup nvme02-ib0
    Server:     10.5.106.2
    Address:    10.5.106.2#53

** server can't find nvme02-ib0: NXDOMAIN ]# chdef nvme02-ib0 ip=10.60.100.9 groups=all 1 object definitions have been created or modified. New object definitions 'nvme02-ib0' have been created. [root@bybc0602 ~]# lsdef nvme02-ib0 Object name: nvme02-ib0 groups=all ip=10.60.100.9 postbootscripts=otherpkgs postscripts=syslog,remoteshell,syncfiles

2.

]# makedns nvme02-ib0 Handling nvme02-ib0 in /etc/hosts. Getting reverse zones, this may take several minutes for a large cluster. Completed getting reverse zones. Updating zones. Completed updating zones. Updating DNS records, this may take several minutes for a large cluster. Completed updating DNS records. DNS setup is completed

3.

]# nslookup nvme02-ib0 Server: 10.5.106.2 Address: 10.5.106.2#53

Name: nvme02-ib0.cluster.com Address: 10.60.100.9

paddyoneill commented 5 years ago

hi @bybai, the proposed workaround works, but is also means that each node needs a separate definition for each interface it has, so it would become cumbersome to manage at scale.

I will try to use the otherintefaces option to see if it works and let you know.

bybai commented 5 years ago

@paddyoneill, thanks your feedback, since the workaround works, it is not a block issue, will plan it in 2.15.

marseaplage commented 2 years ago

Hi guys !! I would like to know if this problem has been solved because it is still present in xcat 2.16.4

marseaplage commented 1 year ago

hi @bybai, the proposed workaround works, but is also means that each node needs a separate definition for each interface it has, so it would become cumbersome to manage at scale.

I will try to use the otherintefaces option to see if it works and let you know.

Hi please, confirm if the problem is solved because I am still facing the same problem with the xcat version:

lsxcatd -a Version 2.16.4 (git commit bb7a4bbbc8bde7e6613558d8d039fe43d49d2079, built Mon Jun 13 08:53:10 EDT 2022) This is a Management Node dbengine=SQLite