mongodb / mongodb-atlas-kubernetes

MongoDB Atlas Kubernetes Operator - Manage your MongoDB Atlas clusters from Kubernetes
http://www.mongodb.com/cloud/atlas
Apache License 2.0
152 stars 78 forks source link

AtlasDatabaseUser connection secrets removed after upgrading #1954

Open sunib opened 3 days ago

sunib commented 3 days ago

What did you do to encounter the bug? We upgraded mongodb-atlas-kubernetes to version 2.5.0 (we came from version 2.3.1). This failed pretty hard and caused downtime for our customers. Somehow the connection secrets were removed for most of our AtlasDatabaseUser objects. This caused our depending deployments to not work anymore. Reverting to 2.3.1 resolved the issues.

What did you expect?

The first reconciliation with 2.5.0 removed random connection secrets. We have a setup where:

What happened instead?

We somehow only had one connection secret per namespace after 2.5.0 started (instead of 5): and it showed weird behavior. It looked like the secrets where deleted. Enabling the debug logs showed that this was indeed right (see next Alinea).

Screenshots image

Operator Information Operator 2.3.1 does work (we downgraded). Operator 2.5.0 does not.

Kubernetes Cluster Information We run Kubernetes 1.30 in EKS.

Probable cause

If this is all true, then I do have a request: could you please add a log line at warning level when connection secrets are deleted. I have carefully tested this upgrade on my test cluster and I would have seen these lines.

josvazg commented 3 days ago

Do you happen to have a minimal YAML that reproduce the issue?

If not, we can try one. Let me clarify if this is the setup that would mimic yours:

Another question, was 2.4.1 also failing for you?

josvazg commented 3 days ago

I am currently suspecting PR #1856

sunib commented 3 days ago

Thank you for the quick responses.

The very short version to replicate this is (I had to redact some stuff):

We have one AtlasProject in the same namespace as the operator:

apiVersion: atlas.mongodb.com/v1
kind: AtlasProject
metadata:
  name: mongodb-atlas-project
spec:
  name: {{ $teamName }}
  teams:
  - teamRef:
      name: {{ $teamName }}
    roles:
    - GROUP_DATA_ACCESS_READ_WRITE
    - GROUP_OWNER
  connectionSecretRef:
    name: atlas-secret
  maintenanceWindow: {}
  projectIpAccessList:
  {{- toYaml .Values.projectIpAccessList | nindent 2 }}
  withDefaultAlertsSettings: true

Then in n namespaces we apply this (for now simplified to two users):

apiVersion: atlas.mongodb.com/v1
kind: AtlasDeployment
metadata:
  annotations:
    mongodb.com/last-applied-configuration: 'redacted'
  creationTimestamp: "2024-08-23T18:52:32Z"
  finalizers:
  - mongodbatlas/finalizer
  generation: 2
  name: atlas
  namespace: tenant-namespace
  resourceVersion: "redacted"
  uid: redacted
spec:
  backupRef:
    name: ""
    namespace: ""
  projectRef:
    name: mongodb-atlas-project
    namespace: mongodb-atlas
  serverlessSpec:
    backupOptions:
      serverlessContinuousBackupEnabled: true
    name: unique-cluster-name-over-ns
    providerSettings:
      backingProviderName: AWS
      providerName: SERVERLESS
      regionName: EU_WEST_1
    tags:
    - key: application
      value: the-application-name
    terminationProtectionEnabled: true
---
apiVersion: atlas.mongodb.com/v1
kind: AtlasDatabaseUser
metadata:
  annotations:
    mongodb.com/atlas-resource-version-policy: allow
    mongodb.com/last-applied-configuration: 'redacted'
  creationTimestamp: "2024-08-23T18:54:20Z"
  finalizers:
  - mongodbatlas/finalizer
  generation: 1
  name: user1-unique-over-ns
  namespace: tenant-namespace
  resourceVersion: "129059651"
  uid: 5f7d8843-815a-495c-8e0f-9aac6f1bbf8d
spec:
  awsIamType: NONE
  databaseName: admin
  oidcAuthType: NONE
  passwordSecretRef:
    name: passsword1
  projectRef:
    name: mongodb-atlas-project
    namespace: mongodb-atlas
  roles:
  - databaseName: database1
    roleName: readWrite
  scopes:
  - name: unique-cluster-name-over-ns
    type: CLUSTER
  username: user1-unique-over-ns
  x509Type: NONE
---
apiVersion: atlas.mongodb.com/v1
kind: AtlasDatabaseUser
metadata:
  annotations:
    mongodb.com/atlas-resource-version-policy: allow
    mongodb.com/last-applied-configuration: 'redacted'
  creationTimestamp: "2024-08-23T18:54:21Z"
  finalizers:
  - mongodbatlas/finalizer
  generation: 1
  name: user2-unique-over-ns
  namespace: tenant-namespace
spec:
  awsIamType: NONE
  databaseName: admin
  oidcAuthType: NONE
  passwordSecretRef:
    name: password2
  projectRef:
    name: mongodb-atlas-project
    namespace: mongodb-atlas
  roles:
  - databaseName: database2
    roleName: readWrite
  scopes:
  - name: unique-cluster-name-over-ns
    type: CLUSTER
  username: user2-unique-over-ns
  x509Type: NONE
sunib commented 3 days ago

I am currently suspecting PR #1856

I agree.

And 2.4.1 was also failing for us (same behavior).

josvazg commented 3 days ago

Wait, if 2.4.1 was also failing, then that PR could not be. The PR mas merged on Oct 11 and v.2.4.1 was released earlier, in August.

Anyway, thanks for the sample YAMLs. I will try to reproduce with them. Do not worry about too many details on them, I just need to reproduce where the resources are and, most likely where the Kubernetes resources are referenced each other (same namespace, across namespaces, etc) so that I get the same issue.

I suspect the issue might be that the project is not in the same namespace as the secrets. So the code might believe they are somehow unused. Still, that would not yet match with 2.4.1 being also broken.

josvazg commented 3 days ago

I have been able to reproduce, I will be working on this shortly.