istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
35.94k stars 7.76k forks source link

zTunnel not populating as a DaemonSet when migrating to 1.23.0 #52958

Closed Dr-Octavius closed 2 months ago

Dr-Octavius commented 2 months ago

Is this the right place to submit this?

Bug Description

Environment: doks-1.30.4-do.0 (Digital Ocean on Kubernetes -> implements 1.30.4 under the hood) istio version 1.23

Installation Configuration:

#---------------------------------------------------------------
# Istio zTunnel Installation using Helm (For Ambient Mode Beta)
#---------------------------------------------------------------
resource "helm_release" "istio_ztunnel" {
  name       = "istio-ztunnel"
  repository = "https://istio-release.storage.googleapis.com/charts"
  chart      = "ztunnel"
  version          = "1.23.0"
  namespace        = kubernetes_namespace.my_namespace.metadata.0.name

  set {
    name  = "istioNamespace"
    value = kubernetes_namespace.my_namespace.metadata.0.name
  }

  set {
    name  = "nodeSelector.nodepool"
    value = "my-np"
  }

  set {
    name  = "multiCluster.clusterName"
    value = "my_cluster"
  }
}

Fixes Attempted:

#---------------------------------------------------------------
# Istio zTunnel Installation using Helm (For Ambient Mode Beta)
#---------------------------------------------------------------
resource "helm_release" "istio_ztunnel" {
  name       = "istio-ztunnel"
  repository = "https://istio-release.storage.googleapis.com/charts"
  chart      = "ztunnel"
  version          = "1.23.0"
  namespace        = kubernetes_namespace.my_namespace.metadata.0.name

  # Fix Attempted
  set {
    name  = "profile"
    value = "ambient"
  }

  set {
    name  = "istioNamespace"
    value = kubernetes_namespace.my_namespace.metadata.0.name
  }

  set {
    name  = "nodeSelector.nodepool"
    value = "my-np"
  }

  set {
    name  = "multiCluster.clusterName"
    value = "my_cluster"
  }
}

result:

istio-cni-node-2s88f      1/1     Running   0          96m
istio-cni-node-99kz7      1/1     Running   0          96m
istio-cni-node-kgq2t      1/1     Running   0          96m
istio-cni-node-rwp8s      1/1     Running   0          96m
istio-cni-node-z5hqg      1/1     Running   0          96m
istiod-584577fd57-r7wgq   1/1     Running   0          95m
ztunnel-5ccn8             1/1     Running   0          13m

istio-cni is able to populate to all 5 nodes running on my cluster. zTunnel only populates a single pod. I have 5 nodes on my current cluster. Fix does not solve the issue

Version

$ kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.30.4

Additional Information

Checking logs when other nodes are deployed confirms this. Typical output of kubectl get events -n <name-space>

8s          Warning   FailedCreatePodSandBox   pod/scribe-deployment-6745dcb48c-kv8lc    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "16de91d3e41e8eb5a4f302fbd94aab14e9fb1093da9cb6d96d5bf9e84b492042": plugin type="istio-cni" name="istio-cni" failed (add): unable to push CNI event (status code 500): partial add error: no ztunnel connection

Namespaces have been labelled as such in terraform:

resource "kubernetes_namespace" "my_namespace" {
  metadata {
    name = "my-namespace"
    labels = {
      "istio.io/dataplane-mode" = "ambient"
    }
  }
}
howardjohn commented 2 months ago

Checking logs when other nodes are deployed confirms this. Typical output of kubectl get events -n

This is from other pods. They are dependant on ztunnel, so it makes sense they are blocked. What we need to see is why k8s isn't scheduling pods. This is plausibly in the Daemonset events.

or could be the node selector?

    name  = "nodeSelector.nodepool"
    value = "my-np"

Do all your nodes have that label?

Dr-Octavius commented 2 months ago

Hey @howardjohn ,

Yep it was definitely the node selector. I wasn't able to catch this during my deploy. Will this be expected behavior moving forward? (I.e. ability to use nodeSelector or even taints and tolerations to control where the DaemonSet deploy) or will this be a protected behavior?

Dr-Octavius commented 2 months ago

To clarify, I was able to run the zTunnel as a DaemonSet properly one I removed that as I realised that it was targeting pods only with that label in the nodepool

howardjohn commented 2 months ago

There are no plans to remove features from the helm charts

Dr-Octavius commented 2 months ago

Hmmm oki!

Thanks @howardjohn 😊