Open Salvoxia opened 4 months ago
Tried in both 25.0.1 the same steps and I wasn't able to reproduce the problem. The user is a member of both parent and a sub-group and deleting one by one, starting from the sub-group, works as expected.
Thank you for investing your time trying to reproduce. I checked again with a clean installation and indeed could not reproduce it the first time around, but I think I found the missing piece: The way the federated user is assigned the admin
role matters. If the role is assigned to the user directly, everything works fine.
However, if the user gets the role via a group role assignment, the error is reliably reproducible.
So the new steps to reproduce are as follows:
LDAP_ONLY
modeTestUser
that gets propagated to LDAPAdminGroup
in Keycloak, add TestUser
as a member and assign the admin
role to the groupTestGroup
in Keycloak and add TestUser
to that group as wellTestUser
TestGroup
I was able to reproduce the issue following the steps described in the last comment.
The issue seems to happen because the admin console sends two requests to the groups endpoint almost simultaneously after deleting the group:
DELETE | http://localhost:8080/admin/realms/master/groups/f96c7a63-51c3-4b3b-82b8-608eece1fa5e (200 OK)
GET | http://localhost:8080/admin/realms/master/groups?first=0&max=21&exact=false&global=false (409 error)
GET | http://localhost:8080/admin/realms/master/groups?max=11 (200 OK)
In the GroupsResource
endpoint, we check if the user has permissions to list groups. This leads to the stack trace seen above where we inspect the groups for roles assigned, which triggers the GroupLdapStorageMapper
to synch the groups. Ultimately, this leads to the creation of any groups that exist in LDAP but not in Keycloak (our Test Group
in this example). One of the two GET requests successfully recreates the group, and the second fails as it tries to create the same group again. It looks like a race condition due to the simultaneous requests made to the endpoint, but curiously it is always the first request that fails for me.
The whole thing is related to the fact that groups are synched automatically to LDAP when created, but not when deleted. If the groups deletion was properly synched, the GET requests wouldn't try to load it again from LDAP because the deleted group would be gone by then.
@pedroigor I think we need to accept this one. One way to fix this would be to investigate why the admin console issues those two separate requests at the same time when loading groups. Another way to fix this is to synch group deletion (I think there's a PR opened for https://github.com/keycloak/keycloak/issues/29099. It has been requested quite a few times already.
@Salvoxia @sguilhen The second request is because of how the UI is rendered. One to render the tree and the other the group on the right side. I also noticed this when working with the scalability of groups.
Glad to know there is a contribution. Let's review the PR then.
I've talked with @edewit about it and we discussed changing the UI to only render the right side of the page (group details) after clicking on a group on the left side. But that won't solve the original problem.
@sguilhen But https://github.com/keycloak/keycloak/issues/29099 seems to be actually adding support to propagate group deletion when removing local groups, isn't it?
The problem here should still happen because we are not really fixing the race condition?
@pedroigor IMO if the group is properly deleted on LDAP, then it won't be re-synced to Keycloak in the next request
@Salvoxia @sguilhen Please, see https://github.com/keycloak/keycloak/pull/31090#issuecomment-2248932845. We can not solve this issue by automatically removing groups in LDAP.
As an alternative, we could avoid the error you reported by making sure synchronization will not happen twice so that we make sure there won't be errors when trying to duplicate groups in the database when running multiple synchronization in parallel. But still, the root problem is not solved.
The root problem here is that we don't have a clear policy on when synchronization should happen. The root cause of this issue is that we are synchronizing groups when evaluating the permissions to access the API. As you are using a user who is a member of an LDAP group, that will force synchronization, and the error will happen. If you try to delete a group where the user is not a member, even if there are other members there, then you will be able to successfully delete the group.
I'll move this issue to our backlog because we need to sort out not only this one but other issues related to LDAP and specifically with the group mapper.
Due to the amount of issues reported by the community we are not able to prioritise resolving this issue at the moment.
If you are affected by this issue, upvote it by adding a :thumbsup: to the description. We would also welcome a contribution to fix the issue.
Before reporting an issue
Area
ldap
Describe the bug
If a federated user has the roles to delete groups in a realm and they try to delete a group they are a member of while using an
ldap-group
mapper with modeLDAP_ONLY
, the admin console shows an errorNetwork response was not OK. Press here to refresh and continue
.Version
25.0.1
Regression
Expected behavior
Keycloak should delete the group and propagate the deletion to LDAP (related to #29099), effectively removing the user from the no-longer existing group in LDAP and Keycloak.
Actual behavior
Admin console shows an error message When following the link the group has not been deleted. A workaround is for the user trying to delete the group to leave the group first.
The Keycloak logs show a unique key constraint violation:
How to Reproduce?
admin
role of themaster
realm for testing)Anything else?
No response