pikhq / musl-nscd

Musl-nscd is an implementation of the NSCD protocol, suitable for use with musl and with standard NSS modules
Other
46 stars 14 forks source link

Doesn’t work with winbind and even segfaults #5

Open jirutka opened 6 years ago

jirutka commented 6 years ago

I can’t get it work with winbind (Samba 4 DC) on Alpine Linux.

$ getent passwd kflynn; echo $?
2

strace of musl-nscd:

accept(4, NULL, NULL)                   = 5
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
mmap(NULL, 94208, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff18f466000
mprotect(0x7ff18f468000, 86016, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7ff18f47ca88, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|0x400000, parent_tidptr=0x7ff18f47cae8, tls=0x7ff18f47cab0, child_tidptr=0x7ff18f47cae8) = 3546
read(5, "\2\0\0\0\0\0\0\0\7\0\0\0", 12) = 12
read(5, "kflynn\0", 7)                  = 7
brk(0x7ff191657000)                     = 0x7ff191657000
getpid()                                = 3542
getpid()                                = 3542
getpid()                                = 3542
lstat("/var/run/samba/winbindd", {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
lstat("/var/run/samba/winbindd/pipe", {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
socket(AF_UNIX, SOCK_STREAM, 0)         = 6
fcntl(6, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
fcntl(6, F_GETFD)                       = 0
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
connect(6, {sa_family=AF_UNIX, sun_path="/var/run/samba/winbindd/pipe"}, 110) = 0
getpid()                                = 3542
getpid()                                = 3542
poll([{fd=6, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=6, revents=POLLOUT}])
write(6, "0\10\0\0\0\0\0\0\0\0\0\0\326\r\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2096) = 2096
getpid()                                = 3542
poll([{fd=6, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=6, revents=POLLIN}])
read(6, "\250\r\0\0\2\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3496) = 3496
getpid()                                = 3542
getpid()                                = 3542
poll([{fd=6, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=6, revents=POLLOUT}])
write(6, "0\10\0\0,\0\0\0\0\0\0\0\326\r\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2096) = 2096
getpid()                                = 3542
poll([{fd=6, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=6, revents=POLLIN}])
read(6, "\313\r\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3496) = 3496
getpid()                                = 3542
poll([{fd=6, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=6, revents=POLLIN}])
read(6, "/var/lib/samba/winbindd_privileg"..., 35) = 35
lstat("/var/lib/samba/winbindd_privileged", {st_mode=S_IFDIR|0750, st_size=8, ...}) = 0
lstat("/var/lib/samba/winbindd_privileged/pipe", {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
socket(AF_UNIX, SOCK_STREAM, 0)         = 7
fcntl(7, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
fcntl(7, F_GETFD)                       = 0
fcntl(7, F_SETFD, FD_CLOEXEC)           = 0
connect(7, {sa_family=AF_UNIX, sun_path="/var/lib/samba/winbindd_privileged/pipe"}, 110) = 0
close(6)                                = 0
poll([{fd=7, events=POLLIN|POLLOUT|POLLHUP}], 1, -1) = 1 ([{fd=7, revents=POLLOUT}])
write(7, "0\10\0\0\1\0\0\0\0\0\0\0\326\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2096) = 2096
getpid()                                = 3542
poll([{fd=7, events=POLLIN|POLLHUP}], 1, 5000) = 1 ([{fd=7, revents=POLLIN}])
read(7, "\250\r\0\0\2\0\0\0ENCOMCOM\\kflynn\0\0\0\0\0\0\0\0\0"..., 3496) = 3496
write(5, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 36) = 36
close(5)                                = 0
poll([{fd=4, events=POLLIN}], 1, -1

And when I query user who is both in /etc/passwd and Samba (winbind fails on it according to documentation):

$ id samflynn
uid=1020(samflynn) gid=1000(encomcom) groups=1000(encomcom)

strace of musl-nscd:

)    = 1 ([{fd=4, revents=POLLIN}])
accept(4, NULL, NULL)                   = 5
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
mmap(NULL, 94208, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f027a24f000
mprotect(0x7f027a251000, 86016, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f027a265a88, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|0x400000, parent_tidptr=0x7f027a265ae8, tls=0x7f027a265ab0, child_tidptr=0x7f027a265ae8) = 3662
read(5, "\2\0\0\0\17\0\0\0\t\0\0\0", 12) = 12
read(5, "samflynn\0", 9)                = 9
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f02} ---
+++ killed by SIGSEGV +++
Segmentation fault

What nss modules do you use with musl-nscd?

the-maldridge commented 5 years ago

@jirutka I do not have an environment handy to test winbind in. The modules I've currently tried are ldap and nsscache. Both have worked, though I have only extensively tested nsscache.

ncopa commented 5 years ago

I suspect the reason is the mismatch of https://github.com/pikhq/musl-nscd/blob/master/include/nss.h

enum nss_status
{
    NSS_STATUS_TRYAGAIN = -2,
    NSS_STATUS_UNAVAIL = -1,
    NSS_STATUS_NOTFOUND = 0,
    NSS_STATUS_SUCCESS = 1,
    NSS_STATUS_RETURN = 2
};

and https://github.com/samba-team/samba/blob/master/nsswitch/winbind_nss.h#L70

typedef enum
{
  NSS_STATUS_SUCCESS=0,
  NSS_STATUS_NOTFOUND=1,
  NSS_STATUS_UNAVAIL=2,
  NSS_STATUS_TRYAGAIN=3
} NSS_STATUS;
the-maldridge commented 5 years ago

@ncopa I agree this was a weird mismatch, however once applying a patch to match the enums the segfaults persist. I am waiting on an strace from one organization that is running winbind as I'm not setup to debug that, but I have my suspicions that musl-nscd is handing off some uninitialized memory, or samba is performing some unchecked operations.

the-maldridge commented 5 years ago

For those following along at home, the problem turned out to be a combination of uninitialized memory and the mismatched enum. The full fix will be on 1.0.3 when it is released.

georglizzard commented 4 years ago

For those following along at home, the problem turned out to be a combination of uninitialized memory and the mismatched enum. The full fix will be on 1.0.3 when it is released.

Hello, Is there any chance to have 1.0.3 version anytime soon? Winbind support is crucial in having a samba domain member as a fileserver for distros that use musl. There are no alternatives to nsswitch for winbind in having Windows ACL fileserver. Maybe this stay at home time will allow you to have some free time to update this project...

the-maldridge commented 4 years ago

@georglizzard 1.0.3 has been tagged since November of last year. This issue remains open because there are still some other problems with winbind. As I do not have winbind in my environment, I cannot analyze the issue.

1.0.3 release: https://github.com/pikhq/musl-nscd/releases/tag/v1.0.3

janmg commented 10 months ago

In alpine linux with samba-dc, I have managed to get winbind to authenticate against my onsite active directory, by using musl-nscd. I need musl-nscd because winbind depends on nsswitch.conf which alpine linux otherwise ignores.

The problem however is that I can connect to the share when nscd is running, but the share only shows up when I stop nscd.

It's the final puzzle piece of an otherwise interesting journey in interoperability between windows server and samba, especially when using a linux based on musl and not using sssd. I don't know if I'll be able to understand the interactions between the AD, Windows client, Samba domain member, Winbind and getting the usermap to work with linux using nscd, but it sure has been an educational experience. insights are appreciated otherwise it will take me some extra years to grasp and propose fixes.

midasi commented 4 months ago

@janmg I don't think this is a musl-nscd issue. We use alpine linux on multiple AD environments with many samba DCs and member servers (file/print) without major issues. All member servers have musl-nscd installed and retrieve user and group information for the domain accounts over libnss and winbind. On client side we use mostly Windows.

Your problem sounds more like a permission issue. Please make sure you follow the official samba guide about "Setting up a Share Using Windows ACLs". Make sure as well that all AD users and groups have the uidNumber/gidNumber attribute set correctly. You can do this automatically with the adman pkg.

janmg commented 4 months ago

I meant to state that I use Samba4 as a domain member on Alpine Linux against an AD running on Windows Server 2022, with Windows 11 clients. Alpine Linux doesn't use glibc libnns so I'm using musl-nscd. I use winbind to do the user lookup, which hangs during the user authentication. But if I kill NSCD the share appears and I can use it. Somewhere between winbind and nscd, the NSS_STATUS_SUCCESS response is stuck, I just can't put my finger on why it works if I shutdown nscd mid-authentication. This is just for my own personal shares, I'm not running anything business critical.

If I have some time I'll do some debugging, I haven't identified where NSCD is stuck. https://github.com/pikhq/musl-nscd/blob/master/include/cache_query.h#L39

ericonr commented 3 months ago

Hi @janmg !

Have you had time to do any further debugging? The fact that winbind's progress is stalled by musl-nscd makes it sound like winbind does some passwd/group lookup when it's starting up, which would get you in a deadlock situation...

winbind startup -> winbind queries musl-nscd -> musl-nscd queries winbind (one of its configured backends) -> winbind isn't ready to answer a request -> musl-nscd never returns

Killing musl-nscd would allow winbind to make forward progress (musl gets a response error, probably returns an error to winbind, and it can then proceed)

If you could attach gdb to either (ideally, both) of winbind and musl-nscd when this is happening, and capture a backtrace (thread apply all backtrace), I think that would give us a great idea of what's going on here.