mongodb / mongodb-atlas-kubernetes

MongoDB Atlas Kubernetes Operator - Manage your MongoDB Atlas clusters from Kubernetes
http://www.mongodb.com/cloud/atlas
Apache License 2.0
152 stars 78 forks source link

Clear/flush state of removed experiments? #828

Closed sunib closed 1 year ago

sunib commented 1 year ago

What did you do to encounter the bug? I played around with the Atlas Operator and created and removed some projects. It's all gone: both my k8s objects and my Atlas objects but I somehow still get logging that tells me this. Where is this state kept?

What did you expect? No logging in my operator of things from the past.

What happened instead? I still get logging every minutor or so: it spams the operator logs so that I don't see my 'real' problems.

Operator Information

Kubernetes Cluster Information

If possible, please include:

So this 'round' of logs is logged by the operator pod. I already tried to kill to pod, and I tripple checkd for the resources they are really not here anymore (also not in another namespace).

{"level":"INFO","time":"2023-01-06T11:08:00.142Z","msg":"The resource first-mongo-deployment/test-db no longer exists, not updating the status","atlasdeployment":"first-mongo-deployment/test-db"}
{"level":"INFO","time":"2023-01-06T11:08:00.142Z","msg":"-> Starting AtlasDeployment reconciliation","atlasdeployment":"first-mongo-deployment/test-db","spec":{"projectRef":{"name":"atlas-deployment-my-project-2","namespace":""},"deploymentSpec":{"name":"test-db","providerSettings":{"backingProviderName":"AZURE","instanceSizeName":"M0","providerName":"TENANT","regionName":"EUROPE_WEST"}},"backupRef":{"name":"","namespace":""}},"status":{"conditions":[{"type":"Ready","status":"False","lastTransitionTime":"2022-12-02T09:55:57Z"},{"type":"ValidationSucceeded","status":"True","lastTransitionTime":"2022-11-18T09:35:01Z"},{"type":"DeploymentReady","status":"False","lastTransitionTime":"2022-12-02T09:55:57Z","reason":"InternalError","message":"DELETE https://cloud.mongodb.com/api/atlas/v1.0/groups/***/clusters/test-db: 401 (request \"USER_CANNOT_ACCESS_GROUP\") User cannot access this group."}],"observedGeneration":3,"stateName":"IDLE","mongoDBVersion":"5.0.14","connectionStrings":{"standard":"***"},"mongoURIUpdated":"2022-11-18T09:35:25Z"}}
{"level":"INFO","time":"2023-01-06T11:08:00.142Z","msg":"Reading Atlas API credentials from the AtlasProject Secret first-mongo-deployment/atlas-deployment-secret","atlasdeployment":"first-mongo-deployment/test-db"}
{"level":"INFO","time":"2023-01-06T11:08:00.142Z","msg":"Status update","atlasdeployment":"first-mongo-deployment/test-db","lastCondition":{"type":"DeploymentReady","status":"False","lastTransitionTime":null,"reason":"AtlasCredentialsNotProvided","message":"can't read Atlas API credentials from the Secret first-mongo-deployment/atlas-deployment-secret: Secret \"atlas-deployment-secret\" not found"}}
{"level":"INFO","time":"2023-01-06T11:08:00.160Z","msg":"The resource first-mongo-deployment/test-db no longer exists, not updating the status","atlasdeployment":"first-mongo-deployment/test-db"}
E0106 11:08:00.220121       1 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"test-db.1737b392db891e6a", GenerateName:"", Namespace:"first-mongo-deployment", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"AtlasDeployment", Namespace:"first-mongo-deployment", Name:"test-db", UID:"20c9f478-d36d-4355-9823-64b2ced9084a", APIVersion:"atlas.mongodb.com/v1", ResourceVersion:"40220544", FieldPath:""}, Reason:"AtlasCredentialsNotProvided", Message:"can't read Atlas API credentials from the Secret first-mongo-deployment/atlas-deployment-secret: Secret \"atlas-deployment-secret\" not found", Source:v1.EventSource{Component:"AtlasDeployment", Host:""}, FirstTimestamp:time.Date(2023, time.January, 6, 11, 7, 9, 921222250, time.Local), LastTimestamp:time.Date(2023, time.January, 6, 11, 8, 0, 142708119, time.Local), Count:6, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'namespaces "first-mongo-deployment" not found' (will not retry!)
igor-karpukhin commented 1 year ago

Hi @sunib,

Could you please provide steps that you performed? Did you try to remove the secret first or you tried to remove both deployment and secret at the same time? Thanks in advance

sunib commented 1 year ago

Hi @igor-karpukhin: It's weeks ago that I started playing with Atlas. These Kubernetes manifests have been deleted for sure but I would not know in which order anymore. The log 'spamming' in the operator has been going on since then. I hoped that it would stop with my recent upgrade to 1.5.1. So restarting the pod also does not help.

The ugly thing of this is that I can't just go and delete the whole cluster for something like this. I also can't just delete the whole Atlas deployment, but I'm sure that I cleaned both. Where could the state be kept? The log lines must orginate from somewhere?

igor-karpukhin commented 1 year ago

Hi @sunib. It looks like you can't delete the Atlas deployment because of the finalizer. What you can do to remove your deployment, is to edit your atlasdeployment resource first, and remove the finalizers section, then you will be able to remove the resource itself with kubectl delete atlasdeployment <your_deployment_name>.

From the logs, it looks like you removed the namespace with the connection secret. I tried to reproduce it but didn't succeed.

The state is not kept anywhere. Every reconcile call is new to the operator. The only reason you see these logs is probably because your atlasdeployment resource is still there in the cluster.

sunib commented 1 year ago

Thank you for your hulp @igor-karpukhin: it was indeed still there with a hanging finalizer!

C:\Users\SimonKoudijs>kubectl -n first-mongo-deployment get AtlasDeployment
NAME      AGE
test-db   54d

I managed to remove the namespace while that resource was still there and it was not showing up in the GUI I'm using (Lens).

The steps to remove where in my case:

kubectl -n first-mongo-deployment get AtlasDeployment
kubectl -n first-mongo-deployment patch AtlasDeployment/test-db --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'

Then you can delete it without trouble.

Please do note that you cannot run this in windows cmd (due to the json stuff). It then gives an error message error: unable to parse "'[": yaml: found unexpected end of stream

igor-karpukhin commented 1 year ago

@sunib I'm glad that I could help. What you can also do instead of patch is to run kubectl -n first-mongo-deployment edit AtlasDeployment/test-db, it will open your default text editor to edit the resource. (see https://jamesdefabia.github.io/docs/user-guide/kubectl/kubectl_edit/)