grafana / tanka

Flexible, reusable and concise configuration for Kubernetes
https://tanka.dev
Apache License 2.0
2.36k stars 167 forks source link

Tanka not detecting complete diff #1057

Open strowi opened 3 months ago

strowi commented 3 months ago

Hi,

i deployed some app with Tanka. Then went ahead and changed labels + added container command (to sleep 30m). Now when i run tanka diff/apply again, tanka only detects the change in labels, but not the the added change in command?

+++ /tmp/MERGED-2315960001/apps.v1.Deployment.app.api    2024-06-06 16:36:57.899934242 +0200
@@ -7,7 +7,7 @@
     app.kubernetes.io/name: api
     app.kubernetes.io/part-of: api
     tanka.dev/environment: 1e7acd1d1bf5ee51acdae6b0c9bfeee87546de31a51c946b
-    team: infrastructure
+    team: developers
   name: api
   namespace: app
 spec:
@@ -36,7 +36,7 @@
       labels:
         app.kubernetes.io/name: api
         app.kubernetes.io/part-of: api
-        team: infrastructure
+        team: developers
     spec:
       affinity:
         podAntiAffinity:

My guess would be this is because the container doesn't have a command in the tanka-template and it doesn't get removed from the running deployment?

Is there a way to fix this?

regards, strowi

zerok commented 3 months ago

Hi 🙂

By default that diff is calculated by kubectl. What versions of Tanka and Kubectl are you using?

strowi commented 3 months ago

This was on

Afterwards i tested also Tanka 0.27.1 (btw. the install command in the release is missing the version) with kubectl up to 1.30 - same result.

Also tried all diff-strategies, no change in result.

Whats worse: Even a tk apply doesn't remove the container command.. ;/ (in CI we are deploying via tk show | krane.., that does actually do this ).

zerok commented 3 months ago

Just to make sure: Can you try to run tk export and check if the generated manifest contains the command you'd like to see and can you please provide some minimal example where you can reproduce this issue? I'm currently struggling to reproduce the problem locally 🙁

zerok commented 3 months ago

Thanks for the info about the release page 🙂 Should be fixed now!

strowi commented 3 months ago

Just to make sure: Can you try to run tk export and check if the generated manifest contains the command you'd like to see and can you please provide some minimal example where you can reproduce this issue? I'm currently struggling to reproduce the problem locally 🙁

Sorry if i wasn't clear enough. The problem is the other way around, the command is NOT present in Tanka and doesn't get removed when applied:

Lets say we have a deployment :

~> tk env add environments/test
cat >> environments/test/main.jsonnet <<EOF
{
  apiVersion: 'apps/v1',
  kind: 'Deployment',
  metadata: {
    name: 'nginx',
    labels: {
      app: 'nginx',
    },
  },
  spec: {
    selector: {
      matchLabels: {
        app: 'nginx',
      },
    },
    template: {
      metadata: {
        labels: {
          app: 'nginx',
        },
      },
      spec: {
        containers: [{
          name: 'nginx',
          image: 'nginx:latest',
        }],
      },
    },
  },
}
EOF

Deploy this with tk apply environment/test. Then for debugging/other purposes you patch it like this because the pod won't start successfully:

~> kubectl patch deployment nginx -p "{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\": \"nginx\",\"command\":[\"sh\",\"-c\",\"sleep 30m\"]}]}}}}"

-> The Pod restarts running "sleep 30m", after finding/fixing the problem, i would expect i could just run tk apply environment/test to have the patched sleep 30m command removed, but it isn't.

The other way around it does seem to work:

PS: Just for completeness sake - if i add/remove the command in tanka and apply, it works as expected.

zerok commented 3 months ago

Ahhh, thank you 🙂 Will try to reproduce it locally ASAP so that I can debug it 😄

zerok commented 3 months ago

You are absolutely correct. The issues seems to be that diff does not handle the command field if the local version is undefined and the live version has something in it. I now exported the manifest locally and it seems like this is a bug within kubectl itself.

I now tried to remove the kubectl.kubernetes.io/last-applied-configuration annotation on the live object and all of a sudden the diff showed the change. This feels somehow related to https://github.com/kubernetes/kubectl/issues/1601 😐

strowi commented 3 months ago

Thx for verifying ;) Indeed that issue sounds related, i also found this one https://github.com/kubernetes/kubectl/issues/1403 , but then i would expect --diff-strategy=server|validate to work, which doesn't.

Current behaviour sound pretty much like the "subset" describe in the Tanka docs.

zerok commented 3 months ago

If it's alright with you, I think we should wait for upstream to fix that issue and see if it also fixes this one here 🙂

strowi commented 3 months ago

Yes sure. Currently this is not such a big problem for us, since we use tanka for templating and a different tool to actually deploy. Hope this gets fixed sooner than other bugs i had with kubernetes (not with Tanka!) ;)