jp-gouin / helm-openldap

Helm chart of Openldap in High availability with multi-master replication and PhpLdapAdmin and Ltb-Passwd
Apache License 2.0
181 stars 115 forks source link

Random breaking of one of the replicas #143

Closed parak closed 4 months ago

parak commented 5 months ago

I'm running into a strange issue sometimes when deploying a standard configuration with 3 replicas. Most of the time everything works as expected, but in some cases one of the pods comes up with an empty main database and a mismatched olcRootPW value that doesn't match the other two pods. The environment variable populated password for the admin user is identical on all three pods.

As the result, the pod in question can't replicate from the other two pods, and fails to give correct responses to queries.

Logs look fairly identical between the three and I didn't see anything that stands out too much, except after the initial configuration and restart of the container (which I assume is normal), the following log lines are repeated:

TLS certificate verification: Error, self signed certificate
syncrepl_message_to_entry: rid=001 mods check (objectClass: value #1 invalid per syntax)

This is on 4.1.1.

Any pointers on how I might be able to debug why this might happen further would be very appreciated. Thanks!

parak commented 5 months ago

I've also tried updating to 4.2.2 from 4.1.1 on a fresh deployment to see if it might help, but even less luck there. Using the exact same values as in 4.1.1, and just bumping the chart version. It seems that customLdifFiles is no longer being processed, so I'm only getting some user01, user02 type of default users from bitnami, even though I'm not defining the 'users' value at all.

Was there a migration changelog that I missed perhaps?

parak commented 4 months ago

After a bit more digging around it seems that for some reason my custom ACLs no longer work on 4.2.2, but the default ones do, so need to do some digging around in that area. Also I guess the replication issue above is due to needing a real certificate since I'm running multiple nodes. Will try making fixes to those two things first before asking for help again :)