lsc-project / lsc

LSC engine
http://lsc-project.org/wiki/documentation/latest/start
Other
105 stars 40 forks source link

Optimization for multi-valued LDAP attributes such as groups uniqueMember #255

Closed FlagonC closed 1 month ago

FlagonC commented 6 months ago

When configuring LSC synchronization, it is currently not possible to optimize modifications to be able to make add and delete modifications for multivalued attributes containing many entries, such as groups with uniqueMember. This can lead to slowness when used with attributes containing thousands of entries.

For exemple, i have a test group contenning multiples users:

dn: cn=test,ou=group,dc=test uniqueMember: uid=test01,ou=people,dc=test uniqueMember: uid=test02,ou=people,dc=test uniqueMember: uid=test03,ou=people,dc=test

If i add the user test04, LSC will make the modification like this:

dn: cn=test,ou=group,dc=test changetype: modify replace: uniqueMember uniqueMember: uid=test01,ou=people,dc=test uniqueMember: uid=test02,ou=people,dc=test uniqueMember: uid=test03,ou=people,dc=test uniqueMember: uid=test04,ou=people,dc=test

The most optimized solution would be to do the following for large groups

dn: cn=test,ou=group,dc=test changetype: modify add: uniqueMember uniqueMember: uid=test04,ou=people,dc=test

https://github.com/lsc-project/lsc/issues/70 We can do add by following this function, but not for delete

LiquidFenrir commented 6 months ago

Hello @coudot, I have fixes/improvements for this implemented but not uploaded/PR'd yet, I'd like to know which one would be preferred as there are two, each with pros and cons.

  1. "optimizing" the FORCE policyType in the case of a REPLACE_VALUES operation in BeanComparator.java to use smaller ADD and DELETE modifications instead of a complete REPLACE. Leverages SetUtils.findMissingNeedles in the opposite direction of MERGE to find the "excess" elements in the destination compared to the source.

  2. Involves adding two policyTypes in the XSD schema, DIFFERENCE and KEEP_SOURCE (names unimportant, but tries to make the meaning obvious), also in case of a REPLACE_VALUES operation:

    • DIFFERENCE is the set difference operation: output = destination - source. Applying the DELETE modification to this output element set, makes this the opposite of the MERGE policyType: it removes excess elements in the destination (whereas MERGE adds the missing elements in the destination).
    • KEEP_SOURCE is the opposite of KEEP (which keeps the destination untouched, as I understand it): it makes the destination a synchronization of the source, working like the optimized FORCE (or like a combination of MERGE and DIFFERENCE).

Both are tested to work the same, at least for the uniqueMember example provided.

I like the first one less because it changes the current expected behaviour, and may break some people's usage, but it's a smaller code change. The second one seems better, it adds features that might be useful and can't risk affecting users' workflows.

I can publish both branches to compare/see the changes, and PR the accepted one. Or I can wait until the community/maintainers decide one idea is better than the other and publish only that branch. Up to you, thanks in advance!

coudot commented 6 months ago

I've no strong opinion on this but we should avoid to have a too complex configuration. Maybe there is no need to keep the old behavior (replace all values) and just use the new way (delete/add). Or keep the replace mode only it there is only one value in source and destination.

Maybe @davidcoutadeur @soisik @rouazana can comment on this.

LiquidFenrir commented 6 months ago

Thanks for the quick reply, I've pushed the changes for the optimized FORCE policy at https://github.com/INTM-Group/lsc/tree/opti-force-policy, you can review them and make a PR directly from our branch, afaik. Otherwise, let me know and it can be done on our side easily, or maybe we should wait for the other maintainers' comments. And it is a good idea to keep the replace mode when it is better than a ton of deletes and adds, which resulted in the second commit. Thanks for the insight.

rouazana commented 6 months ago

Hello,

This looks great, thank you for your contribution!

Two remarks:

I opened the MR for more discussion here: #259

SchaffnerMi commented 5 months ago

Hello,

Thank you for your work. Would it be possible to test this as well? Is it possible to obtain a package for installation under ubuntu?

Thanks, Michel

SchaffnerMi commented 5 months ago

Hello,

We tried to package the application and tested it on our test platform. Unfortunately this doesn't seem to work. The contents of the member "attribute" is completely replaced as before. We set the policy to FORCE for the attribute, do we need to do something more?

Thanks for your feedback, Michel

davidcoutadeur commented 5 months ago

Thanks for the quick reply, I've pushed the changes for the optimized FORCE policy at https://github.com/INTM-Group/lsc/tree/opti-force-policy, you can review them and make a PR directly from our branch, afaik. Otherwise, let me know and it can be done on our side easily, or maybe we should wait for the other maintainers' comments. And it is a good idea to keep the replace mode when it is better than a ton of deletes and adds, which resulted in the second commit. Thanks for the insight.

Hello,

Thanks again for your contribution!

I have read the PR "1. optimizing" the FORCE policyType". This optimization seems really interesting. I have no particular remark on the code itself. I have no such use case currently, but maybe there might be some use cases where we want to force replacement. For example if we want to force a memberOf or member integrity process by our directory? If we want to have this option, this requires to have an option to force the (missingValues.size() + extraValues.size()) >= toSetAttrValues.size() condition.

I didn't understand the way the second solution is intended to work ("2. Involves adding two policyTypes in the XSD schema, DIFFERENCE and KEEP_SOURCE"). Could you show the source code or explain how the DIFFERENCE and KEEP_SOURCE are used?

davidcoutadeur commented 5 months ago

Hello,

We tried to package the application and tested it on our test platform. Unfortunately this doesn't seem to work. The contents of the member "attribute" is completely replaced as before. We set the policy to FORCE for the attribute, do we need to do something more?

Thanks for your feedback, Michel

Did you check you had less modifications than the number of values in the source attribute?

SchaffnerMi commented 5 months ago

In fact we tested on a group of more than 100,000 members where there are less than 10 changes. Maybe we have a problem with our version of LSC which we compiled with some difficulty.

LiquidFenrir commented 5 months ago

Hello,

I didn't understand the way the second solution is intended to work ("2. Involves adding two policyTypes in the XSD schema, DIFFERENCE and KEEP_SOURCE"). Could you show the source code or explain how the DIFFERENCE and KEEP_SOURCE are used?

The second solution was to add a policyType DIFFERENCE that does delete operations on elements in the destination but not the source (like the opposite of MERGE), and a policyType KEEP_SOURCE would be the equivalent of a DIFFERENCE and a MERGE in one operation (acting just like the optimized FORCE of the first commit of the PR). I still have the patch locally and can upload the diff file if you wish to see the code, or even make a separate branch.

They would be used for granularity/more control, but I'm not sure it's that big of a use.

I have no such use case currently, but maybe there might be some use cases where we want to force replacement. For example if we want to force a memberOf or member integrity process by our directory?

I have to admit I'm not well versed in LDAP/AD workings at all, I kind of just made the changes and tested them on a few minimal examples (with a main groupOfUniqueNames object and uniqueMembers getting add/delete/replace'd to be as close to the issue as possible). It's highly probable that there are edge cases where this isn't advisable, or even break compatibility, I'll resort to your (the main contributors') advice/knowledge on this.

SchaffnerMi commented 5 months ago

Hello,

According to the logs the problem seems to come from that the new value and the old value are the same:

Jan 11 11:00:22 - DEBUG - In object "CN=GG_ARC,ou=structures,ou=demo,ou=groups,dc=ad-dev,dc=demo,dc=org": Replacing attribute "member": source values are [uid=toto,ou=demo,ou=people,o=annuaire, uid=alice,ou=demo,ou=people,o=annuaire], old values were [CN=alice,OU=employee,OU=demo,OU=people,DC=ad-dev,DC=demo,DC=org, CN=toto,OU=employee,OU=demo,OU=people,DC=ad-dev,DC=demo,DC=org], new values are [CN=toto,OU=employee,ou=demo,ou=people,dc=ad-dev,dc=demo,dc=org, CN=alice,OU=employee,ou=demo,ou=people,dc=ad-dev,dc=demo,dc=org]

We use code from this page to update members: https://lsc-project.org/documentation/latest/synchronizegroups.html

when we delete a value in member in the LDAP and we do a new sync all value are rewrite into de AD with the exception of the deleted entry like:

old values were (42 entries) a.castagna adelaunay alexandraweber amandineperret amarchierjamet ...

new values are (41 entries) adelaunay alexandraweber amandineperret amarchierjamet ...

also the behavior of not returning everything when there is only one modification on a list of 42 entries does not seem to work

The structure of the source values (LDAP) are not the same with the AD structure, is this why it doesn't work? source value are uid=XXXX,,ou=demo,ou=people,o=annuaire and New value are= CN=XXXX,OU=employee,ou=demo,ou=people,dc=ad-dev,dc=demo,dc=org

davidcoutadeur commented 1 month ago

This PR is replaced by #285