Open GoogleCodeExporter opened 9 years ago
I'll try to take a look at this today.
Original comment by mark.gro...@gmail.com
on 13 Dec 2011 at 2:20
Unfortunately, I don't have a place to set up NIS to test this right now.
I assume NIS is working fine on "host" and other netgroups functionality
is present?
Does
getent netgroup nisgroup1
work?
I believe pdsh is just calling getnetgrent(3), so as long as /etc/nsswitch.conf
is set up to use nis, this should work (I think)
Original comment by mark.gro...@gmail.com
on 15 Dec 2011 at 2:26
The getent netgroup nisgroup1 works just fine. I checked /etc/nsswitch.conf
and it had:
netgroup: files nis
I changed it to:
netgroup: nis files
And things work again. Shouldn't it have failed when checking files but then
succeded via NIS? Anyway, the change to the order has the issue fixed now.
Original comment by BiloxiG...@gmail.com
on 16 Dec 2011 at 3:50
I checked another system that I pdsh from often. It runs Fedora 15, pdsh-2.22
and has "netgroup file nis" in nsswitch.conf. Using netgroups from it works
just fine. So it appears as though the second entry for netgroup in
nsswitch.conf is never being checked if the first one fails to return a valid
result.
Original comment by BiloxiG...@gmail.com
on 16 Dec 2011 at 3:57
Interesting, the code that gathers the netgroup hosts with getnetgrent_r(3)
hasn't
changed in pdsh since it was introduced. However, that is not to say something
else is pdsh changed this behavior. The nss code should be hidden beneath the
setnetgrent/getnetgrent calls, so I'm not sure what I could be doing wrong here,
but it would be interesting to find out why the code is failing with the
different
nsswitch order.
I wonder if running pdsh -g nisgroup1 -q under ltrace will show anything
interesting.
I might also try writing up a testcase later today, if you'd be willing to run
it.
Original comment by mark.gro...@gmail.com
on 16 Dec 2011 at 4:49
Well the behavior is getting to be confusing now. I tried reverting
nsswitch.conf back to the way it was before and pdsh works just fine now. So
what seemed to have fixed it maybe didn't fix it, but that would mean it fixed
itself magically right when I thought I fixed it.
Here's what I get with things working nicely. The "Target nodes" at the end is
exactly what I expect to see, all 16 of the nodes listed in the group I used.
I tried the same with a few different netgroups and all seemed fine.
-------------------------------------------
[root@myhost ~]# pdsh -g group -q
-- DSH-specific options --
Separate stderr/stdout Yes
Path prepended to cmd none
Appended to cmd none
Command: none
Full program pathname /usr/bin/pdsh
Remote program path /usr/bin/pdsh
-- Generic options --
Local username root
Local uid 0
Remote username root
Rcmd type ssh
one ^C will kill pdsh No
Connect timeout (secs) 10
Command timeout (secs) 0
Fanout 32
Display hostname labels Yes
Debugging No
-- Target nodes --
node[101-116]
-------------------------------------------
Original comment by BiloxiG...@gmail.com
on 20 Dec 2011 at 1:20
That is confusing. When you next are able to reproduce the issue,
I'll make a debug version of the netgroups module and we can see
exactly which call is failing.
Do you have nscd or similar running?
Original comment by mark.gro...@gmail.com
on 20 Dec 2011 at 2:20
Yes nscd-2.14.90-14.x86_64 is installed and running.
Original comment by BiloxiG...@gmail.com
on 22 Dec 2011 at 3:46
You probably know more about NIS than I, but I wonder if you have to refresh
the netgroup cache in nscd. The fact that getent worked seems to indicate
that nscd didn't have stale data, but next time this happens, maybe try
nscd -i netgroup
just in case?
Original comment by mark.gro...@gmail.com
on 22 Dec 2011 at 5:25
Original issue reported on code.google.com by
BiloxiG...@gmail.com
on 13 Dec 2011 at 1:18