Should not be able to change instanceSize/DiskGB fields if you are using autoscaling AdvancedDeployment

yzdann commented 2 years ago

What did you do to encounter the bug? Steps to reproduce the behavior:

Create a advancedDeployment with autoscaling.
Change the instanceSize to sth else (with considering minInstanceSize <= instanceSize <= maxInstanceSize).
Watch for changes on Atlas side and we will see that the instanceSize will change to what we configured.

What did you expect? With version 1.2.0 (and older) of the operator with regularDeployment you cannot change instanceSize/diskGB, if you enabled autoscaling for compute/disk which is a way to prevent overwrite changes of Atlas autoscaler but in advancedDeployment operator will allow you to change the instanceSize.

Regular Deployment logic https://github.com/mongodb/mongodb-atlas-kubernetes/blob/73c24914089f04bf7ab932f31ba2ed2b439280c4/pkg/controller/atlasdeployment/deployment.go#L118

https://github.com/mongodb/mongodb-atlas-kubernetes/blob/73c24914089f04bf7ab932f31ba2ed2b439280c4/pkg/controller/atlasdeployment/deployment.go#L123-L151

What happened instead? For example, we have an advancedDeployment with instanceSize M30, Atlas autoscaler because of some load increased the instanceSize to M40. After that we decide to change sth, for example, disable JS and we trigger reconciliation now we are not only changing JS but also changing back the instanceSize to M30 !

Screenshots If applicable, add screenshots to help explain your problem.

Operator Information

v1.2.0

yzdann commented 2 years ago

Related old issue for regularDeployment: https://github.com/mongodb/mongodb-atlas-kubernetes/issues/317

fabritsius commented 2 years ago

Thank you @yzdann for pointing this out 🙏

Personally I'm not a fan of such filtering in the operator as it removes the option to set instanceSize or diskGB when you actually want to change them. It would be better if the Atlas API server did such logic. Keep in mind that for updates the operator sends PATCH requests so you can always omit the fields you do not want to be changed.

This being said in my view your request makes sense in lots of scenarios. Now I'm thinking of adding such filtering under a special annotation and maybe even make filtering the default. In that case same logic will be applied to the regular deployments to make everything unified.

Have to discuss with the team our options and will get back to you shortly.

dan-mckean commented 2 years ago

Hey @yzdann

@fabritsius and I have discussed this at length.

We're going to make a change to make the following true:

If autoscaling is enabled during creation of a deployment, instanceSize and diskGB will be respected and used
If autoscaling is enabled for subsequent spec changes, instanceSize and diskGB will be disregarded by our operator and not sent to Atlas.

We're going to hold off on any kind of special annotation (to override the autoscaling) for now. Any scenario where you want to forcibly change the instanceSize and diskGB are likely as a result of Atlas not autoscaling as you wanted, or as quickly as you'd like. In either case, any product changes we make should be to autoscaling code in Atlas rather than the Operator - forcing a change is a short-term fix and non-ideal workaround.

A change could still be forced by disabling autoscaling - in which case instanceSize and diskGB would be honoured.

We're planning to get this change into a release some time next week.

dan-mckean commented 2 years ago

@yzdann apologies on the delay on this. Some stuff got juggled around in the team and this is just landing now 🙂

We're expecting to do a release with this included this week.

sunchill06 commented 1 year ago

Hi @dan-mckean @fabritsius @igor-karpukhin,

I was testing the latest Operator release in our playground env and have a doubt that I would like to clear:

As per this PR and the comments mentioned in it, it appears that If autoscaling is enabled for subsequent spec changes, instanceSize and diskGB will be disregarded by our operator and not sent to Atlas.

I created an AtlasDeployment using advancedDeploymentSpec as below

apiVersion: atlas.mongodb.com/v1
kind: AtlasDeployment
metadata:
  name: advanced-cluster
spec:
  projectRef:
    name: playground-project
  advancedDeploymentSpec:
    backupEnabled: true
    clusterType: SHARDED
    diskSizeGB: 40
    mongoDBMajorVersion: "4.4"
    name: advanced-cluster
    pitEnabled: false
    replicationSpecs:
    - numShards: 1
      regionConfigs:
      - autoScaling:
          compute:
            enabled: true
            maxInstanceSize: M40
            minInstanceSize: M30
            scaleDownEnabled: true
          diskGB:
            enabled: true
        electableSpecs:
          instanceSize: M30
          nodeCount: 3
        priority: 7
        providerName: GCP
        regionName: WESTERN_EUROPE

Then I edited the AtlasDeployment and updated the spec to change the instanceSize to M40 and I got this error (BASE_INSTANCE_SIZE_MUST_MATCH) in operator logs

{"level":"INFO","time":"2022-11-18T14:11:34.950Z","msg":"Status update","atlasdeployment":"platform/advanced-cluster","lastCondition":{"type":"DeploymentReady","status":"False","lastTransitionTime":null,"reason":"DeploymentNotUpdatedInAtlas","message":"PATCH https://cloud.mongodb.com/api/atlas/v1.5/groups/<redacted-group-id>/clusters/advanced-cluster: 400 (request \"BASE_INSTANCE_SIZE_MUST_MATCH\") Electable and read-only nodes must all have the same instance size."}}

I also expected this to fail but not with this error message, I expected that this patch request would not be sent to Atlas (as per this comment)

But when I edited the AtlasDeployment and updated the spec to include readOnlySpecs as well, the patch request was sent to Atlas and was successful as well and the Deployment's instanceSize got changed at Atlas.

Spec with readOnlySpecs

spec:
  advancedDeploymentSpec:
    backupEnabled: true
    clusterType: SHARDED
    diskSizeGB: 40
    mongoDBMajorVersion: "4.4"
    name: advanced-cluster
    pitEnabled: false
    replicationSpecs:
    - numShards: 1
      regionConfigs:
      - autoScaling:
          compute:
            enabled: true
            maxInstanceSize: M40
            minInstanceSize: M30
            scaleDownEnabled: true
          diskGB:
            enabled: true
        electableSpecs:
          instanceSize: M40
          nodeCount: 3
        priority: 7
        providerName: GCP
        readOnlySpecs:
          instanceSize: M40
          nodeCount: 3
        regionName: WESTERN_EUROPE

Operator logs

{"level":"INFO","time":"2022-11-18T14:18:48.858Z","msg":"Status update","atlasdeployment":"platform/advanced-cluster","lastCondition":{"type":"DeploymentReady","status":"False","lastTransitionTime":null,"reason":"DeploymentUpdating","message":"deployment is updating"}}

❓ 😕 Now this PR and the other PR i.e. https://github.com/mongodb/mongodb-atlas-kubernetes/issues/649 are kind of contradictory where one seems to allow changing instanceSize and other doesn't in case AutoScaling is enabled.

Can you please clarify what is happening here and if my understanding is not correct?

igor-karpukhin commented 1 year ago

Hi, @sunchill06. This PR has been merged 23 days ago: https://github.com/mongodb/mongodb-atlas-kubernetes/pull/715 and it allows changing instanceSize within the minInstanceSize - maxInstanceSize range. This change follows Atlas logic, which allows such behavior.

sunchill06 commented 1 year ago

Thanks for the update @igor-karpukhin Few follow-up questions if you don't mind answering:

Why do I need to supply readOnlySpecs if they were not part of the original spec during creation of the deployment.
What about this comment, shall we ignore this and consider that instanceSize can be changed via operator for advancedDeploymentSpec as well?
Shall we also assume that this problem should also not happen, though I still need to test this at our end.

Sorry about so many queries but its really important for us to clarify this and understand what behaviour to expect from the Operator now.

igor-karpukhin commented 1 year ago

Hi @sunchill06, sorry for the late reply. Let me answer your questions:

You don't have to specify readOnlySpecs if you don't need them. This field is still optional. But, I noticed if I try to update and instanceSize within the min/max range, without specifying readOnlySpecs, it will fail. This happens, because Atlas API also returns readOnlySpecs (you can try to create a cluster only specifying electableSpecs, and use the Atlas API to get the cluster, you will see all types of specs in the output, even if they have 0 nodes). But, I guess, we can handle that better from the operator side.
The instanceSize can be changed by users, not the operator. If you change it within the min/max range - it is allowed by the Atlas.
This issue (#606) shouldn't happen again, yes.

sunchill06 commented 1 year ago

Thanks for your revert @igor-karpukhin

So I believe you would fix the first issue (need to specify readOnlySpecs) with https://github.com/mongodb/mongodb-atlas-kubernetes/issues/777 ??

Also, when you say instanceSize can be changed by users? Do you mean operator won't reconcile the instanceSize due to any autoscaling event at Atlas, by when the AtlasDeployment CR spec is manually updated, the operator will reconcile that (considering its within the autoscaling min/max size range) ??

igor-karpukhin commented 1 year ago

Hi @sunchill06. Sorry for the late response, it looks like I missed the notification.

The problem with readOnlySpecs (and any specs) is resolved by this PR: https://github.com/mongodb/mongodb-atlas-kubernetes/pull/782.

Regarding the instance size. If you set it and enable autoscaling, it wont' be changed if CR is updated, but instanceSize has not been touched (https://github.com/mongodb/mongodb-atlas-kubernetes/blob/25d543bf7ef58a257f0d5e5902c8a681c7d7f73c/pkg/controller/atlasdeployment/advanced_deployment.go#L154)

mongodb / mongodb-atlas-kubernetes

Should not be able to change instanceSize/DiskGB fields if you are using autoscaling AdvancedDeployment #648