zhaojinhong / pdsh-ops_tools

Automatically exported from code.google.com/p/pdsh
GNU General Public License v2.0
0 stars 0 forks source link

2.26-1 on Fedora 16 netgroups #45

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Whenever I try to use pdsh against a NIS netgroup I get an error saying "no 
remote hosts specified".

[root@host ~]# pdsh -g nisgroup1
pdsh@host: no remote hosts specified

[root@host ~]# pdsh -L
3 modules loaded:

Module: rcmd/exec
Author: Mark Grondona <mgrondona@llnl.gov>
Descr:  arbitrary command rcmd connect method
Active: yes

Module: misc/netgroup
Author: Mark Grondona <mgrondona@llnl.gov>
Descr:  Target netgroups from pdsh
Active: yes
Options:
-g groupname      target hosts in netgroup "groupname"
-X groupname      exclude hosts in netgroup "groupname"

Module: rcmd/ssh
Author: Jim Garlick <garlick@llnl.gov>
Descr:  ssh based rcmd connect method
Active: yes

[root@host ~]# yum list installed \*pdsh\*
pdsh.x86_64                2.26-1.fc16
pdsh-mod-netgroup.x86_64   2.26-1.fc16
pdsh-rcmd-ssh.x86_64       2.26-1.fc16

Using "pdsh -w host1,host2,host3" works with no problems.  Not exactly sure if 
this is pdsh, NIS or a combination of the two.

Original issue reported on code.google.com by BiloxiG...@gmail.com on 13 Dec 2011 at 1:18

GoogleCodeExporter commented 9 years ago
I'll try to take a look at this today.

Original comment by mark.gro...@gmail.com on 13 Dec 2011 at 2:20

GoogleCodeExporter commented 9 years ago
Unfortunately, I don't have a place to set up NIS to test this right now.
I assume NIS is working fine on "host" and other netgroups functionality
is present?

Does

 getent netgroup nisgroup1

work?

I believe pdsh is just calling getnetgrent(3), so as long as /etc/nsswitch.conf
is set up to use nis, this should work (I think)

Original comment by mark.gro...@gmail.com on 15 Dec 2011 at 2:26

GoogleCodeExporter commented 9 years ago
The getent netgroup nisgroup1 works just fine.  I checked /etc/nsswitch.conf 
and it had:

netgroup:   files nis

I changed it to:
netgroup:   nis files

And things work again.  Shouldn't it have failed when checking files but then 
succeded via NIS?  Anyway, the change to the order has the issue fixed now.

Original comment by BiloxiG...@gmail.com on 16 Dec 2011 at 3:50

GoogleCodeExporter commented 9 years ago
I checked another system that I pdsh from often.  It runs Fedora 15, pdsh-2.22 
and has "netgroup   file nis" in nsswitch.conf.  Using netgroups from it works 
just fine.  So it appears as though the second entry for netgroup in 
nsswitch.conf is never being checked if the first one fails to return a valid 
result.

Original comment by BiloxiG...@gmail.com on 16 Dec 2011 at 3:57

GoogleCodeExporter commented 9 years ago
Interesting, the code that gathers the netgroup hosts with getnetgrent_r(3) 
hasn't
changed in pdsh since it was introduced. However, that is not to say something
else is pdsh changed this behavior. The nss code should be hidden beneath the
setnetgrent/getnetgrent calls, so I'm not sure what I could be doing wrong here,
but it would be interesting to find out why the code is failing with the 
different
nsswitch order.

I wonder if running pdsh -g nisgroup1 -q under ltrace will show anything 
interesting.

I might also try writing up a testcase later today, if you'd be willing to run 
it.

Original comment by mark.gro...@gmail.com on 16 Dec 2011 at 4:49

GoogleCodeExporter commented 9 years ago
Well the behavior is getting to be confusing now.  I tried reverting 
nsswitch.conf back to the way it was before and pdsh works just fine now.  So 
what seemed to have fixed it maybe didn't fix it, but that would mean it fixed 
itself magically right when I thought I fixed it.

Here's what I get with things working nicely.  The "Target nodes" at the end is 
exactly what I expect to see, all 16 of the nodes listed in the group I used.  
I tried the same with a few different netgroups and all seemed fine.

-------------------------------------------
[root@myhost ~]# pdsh -g group -q
-- DSH-specific options --
Separate stderr/stdout  Yes
Path prepended to cmd   none
Appended to cmd         none
Command:        none
Full program pathname   /usr/bin/pdsh
Remote program path /usr/bin/pdsh

-- Generic options --
Local username      root
Local uid           0
Remote username     root
Rcmd type       ssh
one ^C will kill pdsh   No
Connect timeout (secs)  10
Command timeout (secs)  0
Fanout          32
Display hostname labels Yes
Debugging           No

-- Target nodes --
node[101-116]
-------------------------------------------

Original comment by BiloxiG...@gmail.com on 20 Dec 2011 at 1:20

GoogleCodeExporter commented 9 years ago
That is confusing. When you next are able to reproduce the issue,
I'll make a debug version of the netgroups module and we can see
exactly which call is failing.

Do you have nscd or similar running?

Original comment by mark.gro...@gmail.com on 20 Dec 2011 at 2:20

GoogleCodeExporter commented 9 years ago
Yes nscd-2.14.90-14.x86_64 is installed and running.

Original comment by BiloxiG...@gmail.com on 22 Dec 2011 at 3:46

GoogleCodeExporter commented 9 years ago
You probably know more about NIS than I, but I wonder if you have to refresh
the netgroup cache in nscd. The fact that getent worked seems to indicate
that nscd didn't have stale data, but next time this happens, maybe try
  nscd -i netgroup

just in case?

Original comment by mark.gro...@gmail.com on 22 Dec 2011 at 5:25