rackslab / Slurm-web

Open source web dashboard for Slurm HPC clusters
https://slurm-web.com
GNU General Public License v3.0
340 stars 97 forks source link

LDAP error Unable to extract user primary group with gidNumber attribute from user entries with ActiveDirectory #342

Closed rezib closed 2 months ago

rezib commented 3 months ago

I went ahead and stood up a Rocky 9 server for the gateway and I'm still having issues with authentication against Active Directory. For the sake of troubleshooting, I statically set sAMAccountName in place of uid in ldap.py. This got me a new error: [ERROR] ⸬slurmweb.apps.ldap:45 ↦ LDAP error: Unable to extract user primary group with gidNumber attribute from user entries

Navigating to the login page and entering credentials, I get:

Screenshot 2024-08-23 at 1 23 57 PM Screenshot 2024-08-23 at 1 24 09 PM

Thanks!

Originally posted by @rseaman2016 in https://github.com/rackslab/Slurm-web/issues/340#issuecomment-2307605432

rezib commented 3 months ago

@rseaman2016 The error Unable to extract user primary group with gidNumber attribute from user entries happens because Slurm-web/RFL tries to extract user primary group number from LDAP user entry with gidNumber attribute. This primary group number is then used to solve group membership for this user.

I must say I am not very familiar with Active Directory… Is there any other attribute that holds the user primary group? I don't even know if the notion of primary group is actually relevant in Active Directory. Do you know more than me on this matter?

rseaman2016 commented 3 months ago

Hi @rezib, There is the GID attribute, but that accepts a group name, not a number. "Domain Users" is the default primary group, but is not populated in gidNumber by default. Microsoft even has a note about it: PrimaryGroup

Usage of the gidNumber attribute is definitely configured per-application. For example, our primary storage appliance has never relied on the gidNumber attribute to determine the primary group. However, we recently added the gidNumber attribute to all users because a secondary storage appliance we recently deployed requires the gidNumber attribute to be populated in order to complete the ldap query.

rezib commented 3 months ago

I also read about the primaryGroupID user attribute, which is a group number. According to the documentation I found, the gidNumber is a group attribute.

Don't you have this primaryGroupID attribute in your user entries?

rseaman2016 commented 3 months ago

Thanks for the link. As the documentation mentioned, users primaryGroupID is set to the default RID for Domain Users, but my understanding is that gidNumber is used for posix-compatible applications, similar to the uidNumber attribute.

rezib commented 3 months ago

My understanding is that the issue could be fixed by making gid attribute name configurable in RFL LDAPAuthentifier._get_user_info(), so users can set primaryGroupId instead of gidNumber to retrieve users's primary group number with Active Directory. Do you think that could work for you?

rseaman2016 commented 3 months ago

I'm not sure, especially since doing a bit more testing actually came back with some interesting results. I changed the user_base variable to an OU further down where there happens to only be one user, and was able to return the gidNumber attribute successfully. However, I then ran into this Python error:

Traceback (most recent call last):
  File "/usr/libexec/slurm-web/slurm-web-ldap-check", line 8, in <module>
    sys.exit(SlurmwebExecLDAPCheck.run())
  File "/usr/lib/python3.9/site-packages/slurmweb/exec/ldap.py", line 66, in run
    application.run()
  File "/usr/lib/python3.9/site-packages/slurmweb/apps/ldap.py", line 37, in run
    users = self.authentifier.users(with_groups=True)
  File "/usr/lib/python3.9/site-packages/rfl/authentication/ldap.py", line 300, in users
    groups = self._get_groups(connection, user, user_dn, gidNumber)
  File "/usr/lib/python3.9/site-packages/rfl/authentication/ldap.py", line 163, in _get_groups
    results = connection.search_s(
  File "/usr/lib64/python3.9/site-packages/ldap/ldapobject.py", line 631, in search_s
    return self.search_ext_s(base,scope,filterstr,attrlist,attrsonly,None,None,timeout=self.timeout)
  File "/usr/lib64/python3.9/site-packages/ldap/ldapobject.py", line 625, in search_ext_s
    return self.result(msgid,all=1,timeout=timeout)[1]
  File "/usr/lib64/python3.9/site-packages/ldap/ldapobject.py", line 535, in result
    resp_type, resp_data, resp_msgid = self.result2(msgid,all,timeout)
  File "/usr/lib64/python3.9/site-packages/ldap/ldapobject.py", line 539, in result2
    resp_type, resp_data, resp_msgid, resp_ctrls = self.result3(msgid,all,timeout)
  File "/usr/lib64/python3.9/site-packages/ldap/ldapobject.py", line 543, in result3
    resp_type, resp_data, resp_msgid, decoded_resp_ctrls, retoid, retval = self.result4(
  File "/usr/lib64/python3.9/site-packages/ldap/ldapobject.py", line 553, in result4
    ldap_result = self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
  File "/usr/lib64/python3.9/site-packages/ldap/ldapobject.py", line 128, in _ldap_call
    result = func(*args,**kwargs)
ldap.OPERATIONS_ERROR: {'msgtype': 115, 'msgid': 4, 'result': 1, 'desc': 'Operations error', 'ctrls': [], 'info': '000004DC: LdapErr: DSID-0C090C77, comment: In order to perform this operation a successful bind must be completed on the connection., data 0, v4563'}

I then updated the user_base variable to an OU where there are 7 users and was able to return the gidNumber attribute for only the first user, then returned the same error above. So querying the gidNumber attribute seems to be working, just not for all users. The Python error is curious since we've clearly already authenticated against the domain controllers.

rseaman2016 commented 3 months ago

Looking back at the output when using the original user_base, I realized that the employee that comes up in the query before the "Unable to extract user primary..." error is a former employee, and therefore didn't have the gidNumber attribute populated when we recently added that to all (enabled) accounts. I'm not quite sure how that fits in to the Python error, but I'm guessing it's returning that because it's only querying one user before bombing out and since the one user happens to not have the gidNumber attribute populated, we get the exception from ldap.py:

            raise LDAPAuthenticationError(
                "Unable to extract user primary group with gidNumber attribute from "
                "user entries"
            ) from err
rezib commented 3 months ago

Then I guess RFL and Slurm-web should give the possibibility to configure custom primary group attribute in user entries AND support user entries without primary group attribute. It should not fail in the second case, just keep on searching for other (secondary) groups. What do you think about this proposal?

rseaman2016 commented 3 months ago

I don't think we necessarily need to have a custom primary group attribute, as (assuming we are shooting for POSIX-style attributes), gidNumber seems to be the correct attribute to use, based on a couple of examples below:

https://cloud.google.com/architecture/partners/netapp-cloud-volumes/ad-ldap-posix https://access.redhat.com/articles/3023821

The ability to support user entries without the gidNumber attribute would be important, as I can imagine plenty of AD environments don't have that set (we didn't until last week).

I'd have to test once we get the fixes for this current issue in place, but we might also allow customization of the group_class at ldap.py, line 159, as AD groups have an objectClass of group. That's something I've had to change on some LDAP-backed applications before, when the default is posixGroup.

Thanks for all your help with this!

rezib commented 3 months ago

OK, I see, thank for your insightful feedback!

I just opened 3 RFE for RFL:

The current issue will track addition of corresponding parameters in Slurm-web gateway configuration files and documentation update.