kubernetes-sigs / kustomize

Customization of kubernetes YAML configurations
Apache License 2.0
10.91k stars 2.24k forks source link

kyaml is not respecting `$patch replace|retainKeys` directives #2037

Open win5do opened 4 years ago

win5do commented 4 years ago

tree:

.
├── base
│   ├── kafka.yaml
│   └── kustomization.yaml
└── overlays
    ├── kustomization.yaml
    ├── output.yaml
    └── patch.yaml

base content:

# kustomization.yaml
resources:
    - kafka.yaml

# kafka.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker01
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: broker
          imagePullPolicy: Always
          image: kafka:cloudera-2.1.0
          args: ["start", "broker"]
          volumeMounts:
            - name: kafka-broker01
              mountPath: "/kafka/kafka-logs"
            - name: jaas-config
              mountPath: "/opt/jaas-config"
          env:
            - name: BROKER_ID
              value: "0"
      volumes:
        - name: kafka-broker01
          emptyDir: {}
        - name: jaas-config
          configMap:
            name: jaas-config
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker02
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: broker
          imagePullPolicy: Always
          image: kafka:cloudera-2.1.0
          args: ["start", "broker"]
          volumeMounts:
            - name: kafka-broker02
              mountPath: "/kafka/kafka-logs"
            - name: jaas-config
              mountPath: "/opt/jaas-config"
          env:
            - name: BROKER_ID
              value: "1"
      volumes:
        - name: kafka-broker02
          emptyDir: {}
        - name: jaas-config
          configMap:
            name: jaas-config

overlay contents:

# kustomization.yaml
bases:
  - ../base
patchesStrategicMerge:
  - patch.yaml

# patch.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker01
spec:
  template:
    spec:
      volumes:
        - name: kafka-broker01
          persistentVolumeClaim:
            claimName: kafka-broker01
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker02
spec:
  template:
    spec:
      volumes:
        - name: kafka-broker02
          persistentVolumeClaim:
            claimName: kafka-broker02

cd overlays && kustomize build . > output.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-broker01
spec:
  replicas: 1
  template:
    spec:
      containers:
      - args:
        - start
        - broker
        env:
        - name: BROKER_ID
          value: "0"
        image: kafka:cloudera-2.1.0
        imagePullPolicy: Always
        name: broker
        volumeMounts:
        - mountPath: /kafka/kafka-logs
          name: kafka-broker01
        - mountPath: /opt/jaas-config
          name: jaas-config
      volumes:
      - emptyDir: {} # NOTE: unexpected
        name: kafka-broker01
        persistentVolumeClaim:
          claimName: kafka-broker01
      - configMap:
          name: jaas-config
        name: jaas-config
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-broker02
spec:
  replicas: 1
  template:
    spec:
      containers:
      - args:
        - start
        - broker
        env:
        - name: BROKER_ID
          value: "1"
        image: kafka:cloudera-2.1.0
        imagePullPolicy: Always
        name: broker
        volumeMounts:
        - mountPath: /kafka/kafka-logs
          name: kafka-broker02
        - mountPath: /opt/jaas-config
          name: jaas-config
      volumes:
      - emptyDir: {} # NOTE: unexpected
        name: kafka-broker02
        persistentVolumeClaim:
          claimName: kafka-broker02
      - configMap:
          name: jaas-config
        name: jaas-config

In the output both emptyDir and persistentVolumeClaim field exists.

How to change volumes from emptyDir to PVC use kustomize?

win5do commented 4 years ago

I know it can be achieved by json patch replace op twice. Like this:

# patch1.yaml
- op: replace
  path: /spec/template/spec/volumes/0
  value:
    name: kafka-broker01
    persistentVolumeClaim:
    claimName: kafka-broker01

# patch2.yaml
- op: replace
  path: /spec/template/spec/volumes/0
  value:
    name: kafka-broker02
    persistentVolumeClaim:
    claimName: kafka-broker02

Is there a more convenient way?

win5do commented 4 years ago

After searching for information and testing, I found two methods:

# patch.yaml
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker01
spec:
  template:
    spec:
      volumes:
        - name: kafka-broker01
          emptyDir: null # method 1
          persistentVolumeClaim:
            claimName: kafka-broker01
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker02
spec:
  template:
    spec:
      volumes:
        - name: kafka-broker02
          $patch: delete # method 2
        - name: kafka-broker02
          persistentVolumeClaim:
            claimName: kafka-broker02
sliekens commented 4 years ago

I did some more experimenting... $patch=replace also has an unexpected outcome...

# patch.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker02
spec:
  template:
    spec:
      volumes:
        - name: kafka-broker02
          $patch: replace
          persistentVolumeClaim:
            claimName: kafka-broker02

cd overlays && kustomize build . > output.yaml:

# output.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-broker02
spec:
  replicas: 1
  template:
    spec:
      ...
      volumes: [] # all volumes gone (both base and patch)
ghost commented 4 years ago

Experiencing similar behavior with 3.5.4

paultiplady commented 4 years ago

From the k8s docs, $patch: replace seems like it's supposed to be the way to do this:

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/strategic-merge-patch.md#replace-directive

That directive is broken for me too, I'm getting weird behaviour where it's deleting some of the other objects in the volumes list (but not all of them).

Version:

{Version:3.5.4 GitCommit:3af514fa9f85430f0c1557c4a0291e62112ab026 BuildDate:2020-01-17T14:28:58+00:00 GoOs:darwin GoArch:amd64}

Here's a repo with a stripped-down repro scenario: https://github.com/paultiplady/kustomize-replace-directive-bug

I can confirm that manually removing the base data with key: null works around the problem.

ghost commented 4 years ago

Same I have also worked around the problem with key: null.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

sliekens commented 4 years ago

/remove-lifecycle stale

Not on my watch.

HighwayofLife commented 4 years ago

This is very much an issue, I was able to reproduce it in Kustomize 3.8.1

kustomize version
{Version:3.8.1 GitCommit:0b359d0ef0272e6545eda0e99aacd63aef99c4d0 BuildDate:2020-07-16T05:11:04+01:00 GoOs:darwin GoArch:amd64}

Raw Deployment:

---
# Source: rancher/templates/deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: rancher
  labels:
    app: rancher
    chart: rancher-2.4.6
    heritage: Helm
    release: rancher
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rancher
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: rancher
        release: rancher
    spec:
      serviceAccountName: rancher
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rancher
              topologyKey: kubernetes.io/hostname
      containers:
      - image: rancher/rancher:v2.4.6
        imagePullPolicy: IfNotPresent
        name: rancher
        ports:
        - containerPort: 80
          protocol: TCP
        args:
        # Private CA - don't clear ca certs
        - "--http-listen-port=80"
        - "--https-listen-port=443"
        - "--add-local=auto"
        env:
        - name: CATTLE_NAMESPACE
          value: rancher-system
        - name: CATTLE_PEER_SERVICE
          value: rancher
        - name: AUDIT_LEVEL
          value: "1"
        - name: AUDIT_LOG_MAXAGE
          value: "1"
        - name: AUDIT_LOG_MAXBACKUP
          value: "1"
        - name: AUDIT_LOG_MAXSIZE
          value: "100"
        livenessProbe:
          httpGet:
            path: /healthz
            port: 80
          initialDelaySeconds: 60
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /healthz
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 30
        resources:
          {}
        volumeMounts:
        # Pass CA cert into rancher for private CA
        - mountPath: /etc/rancher/ssl/cacerts.pem
          name: tls-ca-volume
          subPath: cacerts.pem
          readOnly: true
        - mountPath: /var/log/auditlog
          name: audit-log
      # Make audit logs available for Rancher log collector tools.
      - image: busybox
        name: rancher-audit-log
        command: ["tail"]
        args: ["-F", "/var/log/auditlog/rancher-api-audit.log"]
        volumeMounts:
        - mountPath: /var/log/auditlog
          name: audit-log
      volumes:
      - name: tls-ca-volume
        secret:
          defaultMode: 0400
          secretName: tls-ca
      - name: audit-log
        emptyDir: {}

Patch:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: rancher
  # namespace: rancher-system
spec:
  template:
    spec:
      containers:
        - name: rancher
          volumeMounts:
            - name: secrets-store-inline
              mountPath: "/mnt/secrets-store"
              readOnly: true
      volumes:
        - name: secrets-store-inline
          csi:
            driver: secrets-store.csi.k8s.io
            readOnly: true
            volumeAttributes:
              secretProviderClass: "azure-tls"
            nodePublishSecretRef:
              name: secrets-store-creds
        - name: tls-ca-volume
          csi:
            driver: secrets-store.csi.k8s.io
            readOnly: true
            volumeAttributes:
              secretProviderClass: "azure-root-ca"
            nodePublishSecretRef:
              name: secrets-store-creds

Unexpected output:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: rancher
    chart: rancher-2.4.6
    heritage: Helm
    release: rancher
  name: rancher
  namespace: rancher-system
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rancher
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: rancher
        release: rancher
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rancher
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - --http-listen-port=80
        - --https-listen-port=443
        - --add-local=auto
        env:
        - name: CATTLE_NAMESPACE
          value: rancher-system
        - name: CATTLE_PEER_SERVICE
          value: rancher
        - name: AUDIT_LEVEL
          value: "1"
        - name: AUDIT_LOG_MAXAGE
          value: "1"
        - name: AUDIT_LOG_MAXBACKUP
          value: "1"
        - name: AUDIT_LOG_MAXSIZE
          value: "100"
        image: rancher/rancher:v2.4.6
        imagePullPolicy: IfNotPresent
        livenessProbe:
          httpGet:
            path: /healthz
            port: 80
          initialDelaySeconds: 60
          periodSeconds: 30
        name: rancher
        ports:
        - containerPort: 80
          protocol: TCP
        readinessProbe:
          httpGet:
            path: /healthz
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 30
        volumeMounts:
        - mountPath: /etc/rancher/ssl/cacerts.pem
          name: tls-ca-volume
          readOnly: true
          subPath: cacerts.pem
        - mountPath: /var/log/auditlog
          name: audit-log
        - mountPath: /mnt/secrets-store
          name: secrets-store-inline
          readOnly: true
      - args:
        - -F
        - /var/log/auditlog/rancher-api-audit.log
        command:
        - tail
        image: busybox
        name: rancher-audit-log
        volumeMounts:
        - mountPath: /var/log/auditlog
          name: audit-log
      serviceAccountName: rancher
      volumes:
      - csi:
          driver: secrets-store.csi.k8s.io
          nodePublishSecretRef:
            name: secrets-store-creds
          readOnly: true
          volumeAttributes:
            secretProviderClass: azure-root-ca
        name: tls-ca-volume
        secret:
          defaultMode: 256
          secretName: tls-ca
      - name: audit-log
      - csi:
          driver: secrets-store.csi.k8s.io
          nodePublishSecretRef:
            name: secrets-store-creds
          readOnly: true
          volumeAttributes:
            secretProviderClass: azure-tls
        name: secrets-store-inline

Which produces the following error:

The Deployment "rancher" is invalid:
* spec.template.spec.volumes[1].csi: Forbidden: may not specify more than 1 volume type
* spec.template.spec.containers[0].volumeMounts[1].name: Not found: "tls-ca-volume"
fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

neuromantik33 commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

sliekens commented 3 years ago

/remove-lifecycle stale

Try again

k8s-triage-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

sliekens commented 3 years ago

/remove-lifecycle stale

Try again

KnVerey commented 2 years ago

@StevenLiekens can you please clarify the solution you are looking for? It sounds like you've found multiple solutions, and the first one in fact has test coverage as of https://github.com/kubernetes-sigs/kustomize/pull/3727.

In other words is this issue tracking the fact that to remove the emptyDir you need to do this:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker01
spec:
  template:
    spec:
      volumes:
        - name: kafka-broker01
          emptyDir: null # method 1
          persistentVolumeClaim:
            claimName: kafka-broker01

rather than this?

kind: Deployment
apiVersion: apps/v1
metadata:
  name: kafka-broker01
spec:
  template:
    spec:
      volumes:
        - name: kafka-broker01
          $patch: replace
          persistentVolumeClaim:
            claimName: kafka-broker01

Or is there something else you're looking for?

sliekens commented 2 years ago

@KnVerey yep this is about not being able to replace the entire object graph without nulling out the emptyDir

I did not realize you can null out emptyDir and set other properties in a single patch but I'm still unsure if that's what you want.

KnVerey commented 2 years ago

I dug into this a bit more and I now believe the underlying problem is that the kyaml implementation of strategic merge patch does not respect the retainKeys strategy communicated in the openapi's x-kubernetes-patch-strategy field (confirmed to be in kustomize's embedded copy). That strategy only appears two places: in volumes (here) and deployment update strategy. I confirmed the latter is affected too:

# kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- input.yaml

patchesStrategicMerge:
  - patch.yaml
# input.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: foo
  labels:
    app: foo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: foo
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
# patch.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: foo
spec:
  strategy:
    type: Recreate

Result:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: foo
  name: foo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: foo
  strategy:
    rollingUpdate: # nonsensical, should have been cleared
      maxSurge: 1
      maxUnavailable: 1
    type: Recreate

Here are the PRs that implemented this in k/k for refrence: https://github.com/kubernetes/kubernetes/pull/50296 https://github.com/kubernetes/kubernetes/pull/44597

/triage accepted /area openapi /area kyaml

cc @mengqiy @natasha41575

mengqiy commented 2 years ago

It seems sigs.k8s.io/kustomize/kyaml/yaml/merge2 is the only place using SMP. It currently supports 3 directives: replace, merge and delete. There are 3 more directives: deleteFromPrimitiveList, setElementOrder and retainKeys. Depending on the use cases, we may need to support all of them. Ref: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/strategic-merge-patch.md

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sliekens commented 1 year ago

The secret to success is /remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

vaibhav2107 commented 11 months ago

/remove-lifecycle rotten