mongodb / mongodb-atlas-kubernetes

MongoDB Atlas Kubernetes Operator - Manage your MongoDB Atlas clusters from Kubernetes
http://www.mongodb.com/cloud/atlas
Apache License 2.0
149 stars 77 forks source link

AtlasDatabaseUser is not reconciled periodically - manual changes are not overridden until pod restart #1689

Closed legal90 closed 2 months ago

legal90 commented 3 months ago

What did you do to encounter the bug? Steps to reproduce the behavior:

  1. Create any AtlasDatabaseUser resource
  2. Wait for the operator controller to reconcile/create a new database user. Log example:

    {"level":"INFO","time":"2024-07-12T14:12:13.215Z","msg":"Status update","atlasdatabaseuser":"my-namespace/my-dbuser","lastCondition":{"type":"Ready","status":"True","lastTransitionTime":null}}

  3. Do manual changes to that user via Atlas Web UI. For example, update the roles or just delete that user.

Now the declarative configuration in k8s, AtlasDatabaseUser , does not match the real state of the object in Atlas.

What did you expect? The controller should periodically reconcile the object which it controls via AtlasDatabaseUser resource - it should read the actual state from Atlas API, compare it with the desired state declared in AtlasDatabaseUser resource and reconcile it - update the

What happened instead? Operator does nothing to reconcile the DB user in Atlas - changes made manually in UI are never overridden. I waited for > 4 hours - no reaction. There are no any new log message in the controller's log, which points to the fact that operator doesn't re-read the object state from API.

The reconciliation only happens when you restart the controller pod or update the AtlasDatabaseUser resource in k8s.

Operator Information

Kubernetes Cluster Information

Additional context YAML definition:

apiVersion: atlas.mongodb.com/v1
kind: AtlasDatabaseUser
metadata:
  labels:
  name: my-dbuser
  namespace: my-namespace
spec:
  databaseName: admin
  passwordSecretRef:
    name: my-dbuser
  projectRef:
    name: my-atlas-project
    namespace: my-namespace
  roles:
    - databaseName: admin
      roleName: atlasAdmin
  username: dbUserTest
josvazg commented 3 months ago

This is expected up to a point. The default sync period in the Operator is 3 hours, so you should have seen the change after that time.

What happens here is the Operator thinks it has done its work, the Atlas state and the Kubernetes definition matches and so the next check will be in 3 hours.

Note there is no way for the operator to know the UI or any other agent with access to Altas made some changes. It can only learn about next time it tries to reconcile.

Last time I checked, there was nothing on our roadmap to change or make that sync period configurable.

Can you confirm you did not see changes for more than 3 hours since last reconcile? That would be a bug we would need to investigate.

igor-karpukhin commented 3 months ago

Hi @legal90. We confirmed the bug. It looks like the informers do not trigger our reconcilers. We're working on this issue.