mongodb / mongodb-atlas-kubernetes

MongoDB Atlas Kubernetes Operator - Manage your MongoDB Atlas clusters from Kubernetes
http://www.mongodb.com/cloud/atlas
Apache License 2.0
146 stars 75 forks source link

New AtlasDeployment does not generate connection secrets for existing project AtlasDatabaseUser in different namespace #899

Closed eli-kasa closed 1 year ago

eli-kasa commented 1 year ago

What did you do to encounter the bug?

What did you expect?

A Secret project-dep2-user1 (namespace: default) to be created.

What happened instead?

When the new AtlasDeployment was created, the existing AtlasDatabaseUser in the same namespace had a credentials + connection string Secret created, but AtlasDatabaseUser resources in other namespaces did not.

Operator Information

Kubernetes Cluster Information

Additional context

Of note, the operator doesn't seem to log any information about the Secret reconciliation that occurred for AtlasDatabaseUser user2 in the same namespace

igor-karpukhin commented 1 year ago

Hi @eli-kasa. One of the most common errors is that Secret is not labeled: kubectl label secret mongodb-atlas-operator-api-key atlas.mongodb.com/type=credentials -n mongodb-atlas-system (https://www.mongodb.com/docs/atlas/reference/atlas-operator/ak8so-quick-start/#create-a-secret-with-your-api-keys-and-organization-id) Another reason could be that you installed the operator only to the mongodb-atlas-system namespace. Please check the WATCH_NAMESPACE env variable in the operator Deployment resource.

eli-kasa commented 1 year ago

@igor-karpukhin Thanks, but yes, the operators credential secret is properly labeled as are all of the ref password secrets.

The Operator deployment has these env:

env:
  - name: OPERATOR_POD_NAME
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.name
  - name: OPERATOR_NAMESPACE
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace

WATCH_NAMESPACE is not defined. I'm also having trouble find documentation on this parameter, either here on github or on the official MongoDB documentation site... I can find similar var for the Enterprise K8s Operator... does that documentation apply to the Atlas Operator as well, and if I set WATCH_NAMESPACE to * will that address this issue? Not setting it already appears to have the behavior of watching everything, at least on create events...

For hopefully more clarity, as stated in the issue, everything creates fine IF you create the AtlasDatabaseUser after the AtlasDeployment, regardless of namespace (since the operator is not scoped to one by WATCH_NAMESPACE and appears to have a cluster role + binding that grants what appears to be needed), it will create the corresponding {project}-{cluster}-{user} secret with connection strings for the deployment(s) in the same project, in the associated namespaces. What doesn't happen is if there is an existing AtlasDatabaseUser defined in a namespace other than where the operator is deployed, it will not reconcile/create those secrets for users in the other namespaces (unless you are creating the users AFTER the deployment).

Also, using helm to deploy. Here is the operators deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: '1'
    meta.helm.sh/release-name: atlas-operator
    meta.helm.sh/release-namespace: mongodb-atlas-system
  creationTimestamp: '2023-02-23T06:15:14Z'
  generation: 1
  labels:
    app.kubernetes.io/instance: atlas-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: mongodb-atlas-operator
    app.kubernetes.io/version: 1.6.1
    helm.sh/chart: mongodb-atlas-operator-1.6.1
  name: mongodb-atlas-operator
  namespace: mongodb-atlas-system
  resourceVersion: '8506866'
  uid: {a-guid}
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: atlas-operator
      app.kubernetes.io/name: mongodb-atlas-operator
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: atlas-operator
        app.kubernetes.io/name: mongodb-atlas-operator
    spec:
      containers:
        - args:
            - --atlas-domain=https://cloud.mongodb.com/
            - --health-probe-bind-address=:8081
            - --metrics-bind-address=:8080
            - --leader-elect
          command:
            - /manager
          env:
            - name: OPERATOR_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: OPERATOR_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
          image: mongodb/mongodb-atlas-kubernetes-operator:1.6.1
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 8081
              scheme: HTTP
            initialDelaySeconds: 15
            periodSeconds: 20
            successThreshold: 1
            timeoutSeconds: 1
          name: manager
          ports:
            - containerPort: 80
              name: http
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /readyz
              port: 8081
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              cpu: 500m
              memory: 256Mi
            requests:
              cpu: 100m
              memory: 50Mi
          securityContext:
            allowPrivilegeEscalation: false
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
        runAsUser: 2000
      serviceAccount: mongodb-atlas-operator
      serviceAccountName: mongodb-atlas-operator
      terminationGracePeriodSeconds: 10
status:
  availableReplicas: 1
  conditions:
    - lastTransitionTime: '2023-02-23T06:15:14Z'
      lastUpdateTime: '2023-02-23T06:15:35Z'
      message: ReplicaSet "mongodb-atlas-operator-694759b8fc" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: 'True'
      type: Progressing
    - lastTransitionTime: '2023-03-15T17:02:40Z'
      lastUpdateTime: '2023-03-15T17:02:40Z'
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: 'True'
      type: Available
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

As a work around I've resorted to deleting and re-creating the AtlasDatabaseUser which then does generate the secrets (for both deployments) as expected, in the default namespace...

helderjs commented 1 year ago

HI @eli-kasa

Thank you for your feedback and extensive report. We were able to reproduce the bug and we are working on a fix, which should come in the 1.8.0 release.