kubernetes-sigs / kustomize

Customization of kubernetes YAML configurations
Apache License 2.0
10.95k stars 2.24k forks source link

Strategic merge patches should retain existing null values in targets #4628

Closed cep21 closed 1 year ago

cep21 commented 2 years ago

Reproduction steps are below. In the case no-bug, you can see replicaCount: null applied in the output. In the case has-bug, you can see replicaCount: null is missing. The only difference is the patch value.

< kustomize version
{Version:kustomize/v4.5.2 GitCommit:9091919699baf1c5a5bf71b32ca73a993e98088b BuildDate:2022-02-09T23:19:28Z GoOs:linux GoArch:amd64}

Reproduction steps

Directory structure.

< ls -la
total 4.0K
drwxrwxr-x  5 ec2-user ec2-user   51 May  6 19:37 .
drwxrwxrwt 13 root     root     4.0K May  6 19:38 ..
drwxrwxr-x  2 ec2-user ec2-user   50 May  6 19:37 has-bug
drwxrwxr-x  2 ec2-user ec2-user   32 May  6 19:37 no-bug
drwxrwxr-x  2 ec2-user ec2-user   49 May  6 19:37 original
< cat original/test.yaml 
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: chart
spec:
  releaseName: helm-chart
  timeout: 15m
  values:
    chart:
      replicaCount: null
      autoscaling: true
< cat original/kustomization.yaml 
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- test.yaml
< cat no-bug/kustomization.yaml 
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../original
< cat has-bug/kustomization.yaml 
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../original
patches:
  - path: patch.yaml
    target:
      kind: HelmRelease
< cat has-bug/patch.yaml 
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: chart
spec:
  releaseName: helm-chart
  timeout: 15m
  values:
    deepgram-api:
      some: value

Kustomize output

original:

< kustomize build .
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: chart
spec:
  releaseName: helm-chart
  timeout: 15m
  values:
    chart:
      autoscaling: true
      replicaCount: null

has-bug:

< kustomize build .
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: chart
spec:
  releaseName: helm-chart
  timeout: 15m
  values:
    chart:
      autoscaling: true
    deepgram-api:
      some: value

no-bug:

< kustomize build .
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: chart
spec:
  releaseName: helm-chart
  timeout: 15m
  values:
    chart:
      autoscaling: true
      replicaCount: null

Expected output of has-bug above:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: chart
spec:
  releaseName: helm-chart
  timeout: 15m
  values:
    chart:
      autoscaling: true
      replicaCount: null
    deepgram-api:
      some: value
KnVerey commented 2 years ago

If I understand the example correctly, the key will the null value is dropped when the strategic merge patch is applied, but it is retained if the resource is output as-is. I think this isn't a bug--if you ask Kustomize to apply the strategic merge patch, it will attempt to merge in the key with the null value as part of that process, i.e. delete it.

cep21 commented 2 years ago

An explicit null is very different than not being present. For example, with helm charts you can explicitly set values to null that are part of the default values.yaml. It's unclear why it's not a bug to remove a value that was explicitly set in the parent.

cep21 commented 2 years ago

Here is the expected output:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: chart
spec:
  releaseName: helm-chart
  timeout: 15m
  values:
    chart:
      autoscaling: true
      replicaCount: null
    deepgram-api:
      some: value
KnVerey commented 2 years ago

When you apply a merge patch to an object, the null value has a special meaning, specifically, delete the key/map in question. It's equivalent to the $delete directive. Our merge2 doc also specifically mentions null triggering removal. That said, neither one specifically address the case where the null is coming from the target rather than the patch. It is possible that situation is not addressed because it is unexpected / would not occur with a patch operation against the API server. However, since this is a merge operation, it strikes me as plausible that the result should be the same. If you find merge documentation to the contrary, we can certainly reconsider.

What is the type of values.chart.replicaCount in your CR? I'm surprised that an apply would not also just remove that key, even if you get Kustomize to retain it. If that happens because it is a string field, you could use the same workaround on the Kustomize side: replicaCount: "null" has no special meaning for SMP and will be retained. Similarly, you could use a JSON patch to make the changes or restore the null, since nulls have no special meaning there.

cep21 commented 2 years ago

What is the type of values.chart.replicaCount in your CR?

It's a somewhat common pattern in helm charts. There's a replicaCount that has some default value, like 1. And if you want to use a HPA for your deployment, you need to explicitly unset the replicaCount default value (by setting to null).

cep21 commented 2 years ago

I found this in the docs?

if the field is present in both the src and dest, and the src value is null, the field is removed from the dest

Which mentions only the src part (Is this what. you were referring too?)

Does the element of least surprise have any effect here? It's more intuitive that everything under spec.values.chart is untouched since we are only merging in spec.values.deepgram-api

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

KnVerey commented 2 years ago

I spoke to @apelisse from SIG API Machinery about this, and we discussed three other sources of precedent for what to do here:

  1. JSON Merge Patch. While the SMP specification and merge2 docs do not address the case where the null is in the target rather than the patch, SMP is strongly inspired from JSON Merge Patch. Despite saying that "merge patch documents are suitable for describing modifications to JSON documents that primarily use objects for their structure and do not make use of explicit null values" (emphasis mine), it does address this case in an example, which shows the null in the target being retained.
  2. Client-side kubectl apply of a Custom Resource with a nullable: true field. This shows the equivalent behaviour of the kubectl implementation of SMP. The answer is that it leaves the null alone. To reproduce, create a CRD with a nullable: true field, then create a CR with that field initially set to null (make sure that field isn't in the last-applied annotation). Then, apply the CR with that field completely absent. (Note that there are significant gotchas here. For example, it seems to be impossible to update the nullable field to a null value using CSA; it is treated as a deletion.)
  3. Server-side kubectl apply of a Custom Resource with a nullable: true field. This does not actually use the SMP implementation anymore, but it is the future of apply and is still a relevant precedent from a user experience standpoint. It also leaves the null alone (and behaves more intuitively in general).

I consider these precedents sufficient evidence to warrant changing Kustomize's behaviour.

/retitle Strategic merge patches should retain existing null values in targets /triage accepted

seh commented 2 years ago

@KnVerey, with the proposed change in behavior, will a null value for a field in a patch still delete the corresponding field that's present in the target?

KnVerey commented 2 years ago

with the proposed change in behavior, will a null value for a field in a patch still delete the corresponding field that's present in the target?

Yes, it is unambiguous that that behaviour is correct.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

apelisse commented 1 year ago

Should we keep this open?

KnVerey commented 1 year ago

Yes, we ended up agreeing that it should be changed, and it hasn't been worked on yet.

/lifecycle frozen