kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
297 stars 428 forks source link

Cluster API Provider Azure with Cilium #2723

Closed thiDucTran closed 2 years ago

thiDucTran commented 2 years ago

I created https://github.com/cilium/cilium/issues/21678 as I think probably the issue is more with Cilium.

But posting here for a 2nd look.


HI, I am trying to create k8s clusters in Azure (not AKS) using these links as guide: https://cluster-api.sigs.k8s.io/user/quick-start.html , https://capz.sigs.k8s.io/topics/getting-started.html , and https://blog.scottlowe.org/2021/10/07/installing-cilium-via-clusterresourceset/

Steps taken:

my cilium.yaml looks like below

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  labels:
    cni: cilium
  name: thi-test-cilium
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: thi-test-cilium-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AzureCluster
    name: thi-test-cilium
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureCluster
metadata:
  name: thi-test-cilium
  namespace: default
spec:
  identityRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AzureClusterIdentity
    name: management-cluster-identity
  location: eastus
  networkSpec:
    vnet:
      name: vnet-cilium-stage
      cidrBlocks: 
        - 10.0.0.0/16
    subnets:
      - name: subnet-control-plane-cilium-stage
        role: control-plane
        cidrBlocks: 
          - 10.0.1.0/24
        # securityGroup:
        #   name: subnet-control-plane-cilium-stage-nsg
          # securityRules:
          #   - name: "allow_ssh"
          #     description: "allow SSH"
          #     direction: "Inbound"
          #     priority: 2200
          #     protocol: "*"
          #     destination: "*"
          #     destinationPorts: "22"
          #     source: "*"
          #     sourcePorts: "*"
          #   - name: "allow_apiserver"
          #     description: "Allow K8s API Server"
          #     direction: "Inbound"
          #     priority: 2201
          #     protocol: "*"
          #     destination: "*"
          #     destinationPorts: "6443"
          #     source: "*"
          #     sourcePorts: "*"
          #   - name: "allow_port_50000"
          #     description: "allow port 50000"
          #     direction: "Outbound"
          #     priority: 2202
          #     protocol: "Tcp"
          #     destination: "*"
          #     destinationPorts: "50000"
          #     source: "*"
          #     sourcePorts: "*"
      - name: subnet-worker-node-cilium-stage
        role: node
        cidrBlocks: 
          - 10.0.2.0/24
        # securityGroup:
        #   name: subnet-worker-node-cilium-stage-nsg
          # securityRules:
          #   - name: "allow_ssh"
          #     description: "allow SSH"
          #     direction: "Inbound"
          #     priority: 2200
          #     protocol: "*"
          #     destination: "*"
          #     destinationPorts: "22"
          #     source: "*"
          #     sourcePorts: "*"
          #   - name: "allow_apiserver"
          #     description: "Allow K8s API Server"
          #     direction: "Inbound"
          #     priority: 2201
          #     protocol: "*"
          #     destination: "*"
          #     destinationPorts: "6443"
          #     source: "*"
          #     sourcePorts: "*"
          #   - name: "allow_port_50000"
          #     description: "allow port 50000"
          #     direction: "Outbound"
          #     priority: 2202
          #     protocol: "Tcp"
          #     destination: "*"
          #     destinationPorts: "50000"
          #     source: "*"
          #     sourcePorts: "*"
  resourceGroup: om-rd-clusterapi-thi-stage
  subscriptionID: 123456
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: thi-test-cilium-control-plane
  namespace: default
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-config: /etc/kubernetes/azure.json
          cloud-provider: azure
        extraVolumes:
        - hostPath: /etc/kubernetes/azure.json
          mountPath: /etc/kubernetes/azure.json
          name: cloud-config
          readOnly: true
        timeoutForControlPlane: 20m
      controllerManager:
        extraArgs:
          allocate-node-cidrs: "false"
          cloud-config: /etc/kubernetes/azure.json
          cloud-provider: azure
          cluster-name: thi-test-cilium
        extraVolumes:
        - hostPath: /etc/kubernetes/azure.json
          mountPath: /etc/kubernetes/azure.json
          name: cloud-config
          readOnly: true
      etcd:
        local:
          dataDir: /var/lib/etcddisk/etcd
          extraArgs:
            quota-backend-bytes: "8589934592"
    diskSetup:
      filesystems:
      - device: /dev/disk/azure/scsi1/lun0
        extraOpts:
        - -E
        - lazy_itable_init=1,lazy_journal_init=1
        filesystem: ext4
        label: etcd_disk
      - device: ephemeral0.1
        filesystem: ext4
        label: ephemeral0
        replaceFS: ntfs
      partitions:
      - device: /dev/disk/azure/scsi1/lun0
        layout: true
        overwrite: false
        tableType: gpt
    files:
    - contentFrom:
        secret:
          key: control-plane-azure.json
          name: thi-test-cilium-control-plane-azure-json
      owner: root:root
      path: /etc/kubernetes/azure.json
      permissions: "0644"
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          azure-container-registry-config: /etc/kubernetes/azure.json
          cloud-config: /etc/kubernetes/azure.json
          cloud-provider: azure
        name: '{{ ds.meta_data["local_hostname"] }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          azure-container-registry-config: /etc/kubernetes/azure.json
          cloud-config: /etc/kubernetes/azure.json
          cloud-provider: azure
        name: '{{ ds.meta_data["local_hostname"] }}'
    mounts:
    - - LABEL=etcd_disk
      - /var/lib/etcddisk
    postKubeadmCommands: []
    preKubeadmCommands: []
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: AzureMachineTemplate
      name: thi-test-cilium-control-plane
  replicas: 1
  version: 1.25.2
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
  name: thi-test-cilium-control-plane
  namespace: default
spec:
  template:
    spec:
      dataDisks:
      - diskSizeGB: 256
        lun: 0
        nameSuffix: etcddisk
      osDisk:
        diskSizeGB: 128
        osType: Linux
      sshPublicKey: ""
      vmSize: Standard_D8s_v3
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: thi-test-cilium-md-0
  namespace: default
spec:
  clusterName: thi-test-cilium
  replicas: 2
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: thi-test-cilium-md-0
      clusterName: thi-test-cilium
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AzureMachineTemplate
        name: thi-test-cilium-md-0
      version: 1.25.2
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
  name: thi-test-cilium-md-0
  namespace: default
spec:
  template:
    spec:
      osDisk:
        diskSizeGB: 128
        osType: Linux
      sshPublicKey: ""
      vmSize: Standard_D8s_v3
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: thi-test-cilium-md-0
  namespace: default
spec:
  template:
    spec:
      files:
      - contentFrom:
          secret:
            key: worker-node-azure.json
            name: thi-test-cilium-md-0-azure-json
        owner: root:root
        path: /etc/kubernetes/azure.json
        permissions: "0644"
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            azure-container-registry-config: /etc/kubernetes/azure.json
            cloud-config: /etc/kubernetes/azure.json
            cloud-provider: azure
          name: '{{ ds.meta_data["local_hostname"] }}'
      preKubeadmCommands: []
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureClusterIdentity
metadata:
  labels:
    clusterctl.cluster.x-k8s.io/move-hierarchy: "true"
  name: management-cluster-identity
  namespace: default
spec:
  allowedNamespaces: {}
  clientID: 1334f19b-cbf5-40ec-a481-f4e92d5b32ac
  clientSecret:
    name: management-cluster-identity-secret
    namespace: default
  tenantID: 090b858b-df2f-4b8b-a828-6f2518a31805
  type: ServicePrincipal
---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
  name: cilium-crs
  namespace: default
spec:
  clusterSelector:
    matchLabels:
      cni: cilium 
  resources:
  - kind: ConfigMap
    name: cilium-crs-cm

image

ubuntuuser@ubuntuuser:~/Downloads/ciliumClusterApi$ kubectl --kubeconfig=./thi-test-cilium.kubeconfig -n kube-system describe po/cilium-operator-bc4d5b54-rlcvk
Name:                 cilium-operator-bc4d5b54-rlcvk
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      cilium-operator
Node:                 thi-test-cilium-md-0-zbvcg/10.0.2.4
Start Time:           Tue, 11 Oct 2022 23:19:19 -0400
Labels:               io.cilium/app=operator
                      name=cilium-operator
                      pod-template-hash=bc4d5b54
Annotations:          <none>
Status:               Running
IP:                   10.0.2.4
IPs:
  IP:           10.0.2.4
Controlled By:  ReplicaSet/cilium-operator-bc4d5b54
Containers:
  cilium-operator:
    Container ID:  containerd://03ad2f9df6c773546d603d05e59647a8de18a08867481d07e6fc9e057b976ad7
    Image:         quay.io/cilium/operator-generic:v1.12.2@sha256:00508f78dae5412161fa40ee30069c2802aef20f7bdd20e91423103ba8c0df6e
    Image ID:      quay.io/cilium/operator-generic@sha256:00508f78dae5412161fa40ee30069c2802aef20f7bdd20e91423103ba8c0df6e
    Port:          <none>
    Host Port:     <none>
    Command:
      cilium-operator-generic
    Args:
      --config-dir=/tmp/cilium/config-map
      --debug=$(CILIUM_DEBUG)
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   sg="  --nodes-gc-interval='5m0s'" subsys=cilium-operator-generic
level=info msg="  --operator-api-serve-addr='127.0.0.1:9234'" subsys=cilium-operator-generic
level=info msg="  --operator-prometheus-serve-addr=':9963'" subsys=cilium-operator-generic
level=info msg="  --parallel-alloc-workers='50'" subsys=cilium-operator-generic
level=info msg="  --pprof='false'" subsys=cilium-operator-generic
level=info msg="  --pprof-port='6061'" subsys=cilium-operator-generic
level=info msg="  --remove-cilium-node-taints='true'" subsys=cilium-operator-generic
level=info msg="  --set-cilium-is-up-condition='true'" subsys=cilium-operator-generic
level=info msg="  --skip-crd-creation='false'" subsys=cilium-operator-generic
level=info msg="  --subnet-ids-filter=''" subsys=cilium-operator-generic
level=info msg="  --subnet-tags-filter=''" subsys=cilium-operator-generic
level=info msg="  --synchronize-k8s-nodes='true'" subsys=cilium-operator-generic
level=info msg="  --synchronize-k8s-services='true'" subsys=cilium-operator-generic
level=info msg="  --unmanaged-pod-watcher-interval='15'" subsys=cilium-operator-generic
level=info msg="  --version='false'" subsys=cilium-operator-generic
level=info msg="Cilium Operator 1.12.2 c7516b9 2022-09-14T15:25:06+02:00 go version go1.18.6 linux/amd64" subsys=cilium-operator-generic
level=info msg="Starting apiserver on address 127.0.0.1:9234" subsys=cilium-operator-api
level=info msg="Establishing connection to apiserver" host="https://10.96.0.1:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://10.96.0.1:443" subsys=k8s
level=error msg="Unable to contact k8s api-server" error="Get \"https://10.96.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.96.0.1:443: i/o timeout" ipAddr="https://10.96.0.1:443" subsys=k8s
level=fatal msg="Unable to connect to Kubernetes apiserver" error="unable to create k8s client: unable to create k8s client: Get \"https://10.96.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.96.0.1:443: i/o timeout" subsys=cilium-operator-generic

      Exit Code:    1
      Started:      Tue, 11 Oct 2022 23:33:41 -0400
      Finished:     Tue, 11 Oct 2022 23:34:46 -0400
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://127.0.0.1:9234/healthz delay=60s timeout=3s period=10s #success=1 #failure=3
    Environment:
      K8S_NODE_NAME:          (v1:spec.nodeName)
      CILIUM_K8S_NAMESPACE:  kube-system (v1:metadata.namespace)
      CILIUM_DEBUG:          <set to the key 'debug' of config map 'cilium-config'>  Optional: true
    Mounts:
      /tmp/cilium/config-map from cilium-config-path (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6x6jq (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  cilium-config-path:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cilium-config
    Optional:  false
  kube-api-access-6x6jq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  19m                  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   Scheduled         18m                  default-scheduler  Successfully assigned kube-system/cilium-operator-bc4d5b54-rlcvk to thi-test-cilium-md-0-zbvcg
  Normal   Pulling           18m                  kubelet            Pulling image "quay.io/cilium/operator-generic:v1.12.2@sha256:00508f78dae5412161fa40ee30069c2802aef20f7bdd20e91423103ba8c0df6e"
  Normal   Pulled            18m                  kubelet            Successfully pulled image "quay.io/cilium/operator-generic:v1.12.2@sha256:00508f78dae5412161fa40ee30069c2802aef20f7bdd20e91423103ba8c0df6e" in 17.455694087s
  Warning  Unhealthy         16m (x3 over 16m)    kubelet            Liveness probe failed: Get "http://127.0.0.1:9234/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing           16m                  kubelet            Container cilium-operator failed liveness probe, will be restarted
  Normal   Created           12m (x5 over 18m)    kubelet            Created container cilium-operator
  Normal   Pulled            12m (x4 over 16m)    kubelet            Container image "quay.io/cilium/operator-generic:v1.12.2@sha256:00508f78dae5412161fa40ee30069c2802aef20f7bdd20e91423103ba8c0df6e" already present on machine
  Normal   Started           12m (x5 over 18m)    kubelet            Started container cilium-operator
  Warning  Unhealthy         11m                  kubelet            Liveness probe failed: Get "http://127.0.0.1:9234/healthz": dial tcp 127.0.0.1:9234: connect: connection refused
  Warning  BackOff           3m1s (x31 over 14m)  kubelet            Back-off restarting failed container

here is cilium-hubble-1.12.2.yaml

---
# Source: cilium/templates/cilium-agent/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: "cilium"
  namespace: kube-system
---
# Source: cilium/templates/cilium-operator/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: "cilium-operator"
  namespace: kube-system
---
# Source: cilium/templates/hubble-relay/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: "hubble-relay"
  namespace: kube-system
---
# Source: cilium/templates/hubble-ui/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: "hubble-ui"
  namespace: kube-system
---
# Source: cilium/templates/cilium-ca-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: cilium-ca
  namespace: kube-system
data:
  ca.crt: 123456
  ca.key: 123456
# Source: cilium/templates/hubble/tls-helm/ca-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: hubble-ca-secret
  namespace: kube-system
data:
  ca.crt: 123456
  ca.key: 123456
# Source: cilium/templates/hubble/tls-helm/relay-client-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: hubble-relay-client-certs
  namespace: kube-system
type: kubernetes.io/tls
data:
  ca.crt:  123456
  tls.crt: 123456
  tls.key: 123456
---
# Source: cilium/templates/hubble/tls-helm/server-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: hubble-server-certs
  namespace: kube-system
type: kubernetes.io/tls
data:
  ca.crt:  123456
  tls.crt: 123456
  tls.key: 123456
# Source: cilium/templates/cilium-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-config
  namespace: kube-system
data:

  # Identity allocation mode selects how identities are shared between cilium
  # nodes by setting how they are stored. The options are "crd" or "kvstore".
  # - "crd" stores identities in kubernetes as CRDs (custom resource definition).
  #   These can be queried with:
  #     kubectl get ciliumid
  # - "kvstore" stores identities in an etcd kvstore, that is
  #   configured below. Cilium versions before 1.6 supported only the kvstore
  #   backend. Upgrades from these older cilium versions should continue using
  #   the kvstore by commenting out the identity-allocation-mode below, or
  #   setting it to "kvstore".
  identity-allocation-mode: crd
  cilium-endpoint-gc-interval: "5m0s"
  nodes-gc-interval: "5m0s"
  # Disable the usage of CiliumEndpoint CRD
  disable-endpoint-crd: "false"

  # If you want to run cilium in debug mode change this value to true
  debug: "false"
  # The agent can be put into the following three policy enforcement modes
  # default, always and never.
  # https://docs.cilium.io/en/latest/policy/intro/#policy-enforcement-modes
  enable-policy: "default"

  # Enable IPv4 addressing. If enabled, all endpoints are allocated an IPv4
  # address.
  enable-ipv4: "true"

  # Enable IPv6 addressing. If enabled, all endpoints are allocated an IPv6
  # address.
  enable-ipv6: "false"
  # Users who wish to specify their own custom CNI configuration file must set
  # custom-cni-conf to "true", otherwise Cilium may overwrite the configuration.
  custom-cni-conf: "false"
  enable-bpf-clock-probe: "true"
  # If you want cilium monitor to aggregate tracing for packets, set this level
  # to "low", "medium", or "maximum". The higher the level, the less packets
  # that will be seen in monitor output.
  monitor-aggregation: medium

  # The monitor aggregation interval governs the typical time between monitor
  # notification events for each allowed connection.
  #
  # Only effective when monitor aggregation is set to "medium" or higher.
  monitor-aggregation-interval: 5s

  # The monitor aggregation flags determine which TCP flags which, upon the
  # first observation, cause monitor notifications to be generated.
  #
  # Only effective when monitor aggregation is set to "medium" or higher.
  monitor-aggregation-flags: all
  # Specifies the ratio (0.0-1.0) of total system memory to use for dynamic
  # sizing of the TCP CT, non-TCP CT, NAT and policy BPF maps.
  bpf-map-dynamic-size-ratio: "0.0025"
  # bpf-policy-map-max specifies the maximum number of entries in endpoint
  # policy map (per endpoint)
  bpf-policy-map-max: "16384"
  # bpf-lb-map-max specifies the maximum number of entries in bpf lb service,
  # backend and affinity maps.
  bpf-lb-map-max: "65536"
  # bpf-lb-bypass-fib-lookup instructs Cilium to enable the FIB lookup bypass
  # optimization for nodeport reverse NAT handling.
  bpf-lb-external-clusterip: "false"

  # Pre-allocation of map entries allows per-packet latency to be reduced, at
  # the expense of up-front memory allocation for the entries in the maps. The
  # default value below will minimize memory usage in the default installation;
  # users who are sensitive to latency may consider setting this to "true".
  #
  # This option was introduced in Cilium 1.4. Cilium 1.3 and earlier ignore
  # this option and behave as though it is set to "true".
  #
  # If this value is modified, then during the next Cilium startup the restore
  # of existing endpoints and tracking of ongoing connections may be disrupted.
  # As a result, reply packets may be dropped and the load-balancing decisions
  # for established connections may change.
  #
  # If this option is set to "false" during an upgrade from 1.3 or earlier to
  # 1.4 or later, then it may cause one-time disruptions during the upgrade.
  preallocate-bpf-maps: "false"

  # Regular expression matching compatible Istio sidecar istio-proxy
  # container image names
  sidecar-istio-proxy-image: "cilium/istio_proxy"

  # Name of the cluster. Only relevant when building a mesh of clusters.
  cluster-name: default
  # Unique ID of the cluster. Must be unique across all conneted clusters and
  # in the range of 1 and 255. Only relevant when building a mesh of clusters.
  cluster-id: "0"

  # Encapsulation mode for communication between nodes
  # Possible values:
  #   - disabled
  #   - vxlan (default)
  #   - geneve
  tunnel: "vxlan"
  # Enables L7 proxy for L7 policy enforcement and visibility
  enable-l7-proxy: "true"

  enable-ipv4-masquerade: "true"
  enable-ipv6-masquerade: "true"

  enable-xt-socket-fallback: "true"
  install-iptables-rules: "true"
  install-no-conntrack-iptables-rules: "false"

  auto-direct-node-routes: "false"
  enable-local-redirect-policy: "false"

  kube-proxy-replacement: "disabled"
  bpf-lb-sock: "false"
  enable-health-check-nodeport: "true"
  node-port-bind-protection: "true"
  enable-auto-protect-node-port-range: "true"
  enable-svc-source-range-check: "true"
  enable-l2-neigh-discovery: "true"
  arping-refresh-period: "30s"
  enable-endpoint-health-checking: "true"
  enable-health-checking: "true"
  enable-well-known-identities: "false"
  enable-remote-node-identity: "true"
  synchronize-k8s-nodes: "true"
  operator-api-serve-addr: "127.0.0.1:9234"
  # Enable Hubble gRPC service.
  enable-hubble: "true"
  # UNIX domain socket for Hubble server to listen to.
  hubble-socket-path: "/var/run/cilium/hubble.sock"
  # An additional address for Hubble server to listen to (e.g. ":4244").
  hubble-listen-address: ":4244"
  hubble-disable-tls: "false"
  hubble-tls-cert-file: /var/lib/cilium/tls/hubble/server.crt
  hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
  hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
  ipam: "cluster-pool"
  cluster-pool-ipv4-cidr: "10.0.0.0/16"
  cluster-pool-ipv4-mask-size: "24"
  disable-cnp-status-updates: "true"
  enable-vtep: "false"
  vtep-endpoint: ""
  vtep-cidr: ""
  vtep-mask: ""
  vtep-mac: ""
  enable-bgp-control-plane: "false"
  procfs: "/host/proc"
  bpf-root: "/sys/fs/bpf"
  cgroup-root: "/run/cilium/cgroupv2"
  enable-k8s-terminating-endpoint: "true"
  remove-cilium-node-taints: "true"
  set-cilium-is-up-condition: "true"
  unmanaged-pod-watcher-interval: "15"
  tofqdns-dns-reject-response-code: "refused"
  tofqdns-enable-dns-compression: "true"
  tofqdns-endpoint-max-ip-per-hostname: "50"
  tofqdns-idle-connection-grace-period: "0s"
  tofqdns-max-deferred-connection-deletes: "10000"
  tofqdns-min-ttl: "3600"
  tofqdns-proxy-response-max-delay: "100ms"
  agent-not-ready-taint-key: "node.cilium.io/agent-not-ready"
---
# Source: cilium/templates/hubble-relay/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: hubble-relay-config
  namespace: kube-system
data:
  config.yaml: |
    cluster-name: default
    peer-service: "hubble-peer.kube-system.svc.cluster.local:443"
    listen-address: :4245
    dial-timeout: 
    retry-timeout: 
    sort-buffer-len-max: 
    sort-buffer-drain-timeout: 
    tls-client-cert-file: /var/lib/hubble-relay/tls/client.crt
    tls-client-key-file: /var/lib/hubble-relay/tls/client.key
    tls-hubble-server-ca-files: /var/lib/hubble-relay/tls/hubble-server-ca.crt
    disable-server-tls: true
---
# Source: cilium/templates/hubble-ui/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: hubble-ui-nginx
  namespace: kube-system
data:
  nginx.conf: "server {\n    listen       8081;\n    listen       [::]:8081;\n    server_name  localhost;\n    root /app;\n    index index.html;\n    client_max_body_size 1G;\n\n    location / {\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n\n        # CORS\n        add_header Access-Control-Allow-Methods \"GET, POST, PUT, HEAD, DELETE, OPTIONS\";\n        add_header Access-Control-Allow-Origin *;\n        add_header Access-Control-Max-Age 1728000;\n        add_header Access-Control-Expose-Headers content-length,grpc-status,grpc-message;\n        add_header Access-Control-Allow-Headers range,keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout;\n        if ($request_method = OPTIONS) {\n            return 204;\n        }\n        # /CORS\n\n        location /api {\n            proxy_http_version 1.1;\n            proxy_pass_request_headers on;\n            proxy_hide_header Access-Control-Allow-Origin;\n            proxy_pass http://127.0.0.1:8090;\n        }\n\n        location / {\n            try_files $uri $uri/ /index.html;\n        }\n    }\n}"
---
# Source: cilium/templates/cilium-agent/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cilium
rules:
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  - services
  - pods
  - endpoints
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - list
  - watch
  # This is used when validating policies in preflight. This will need to stay
  # until we figure out how to avoid "get" inside the preflight, and then
  # should be removed ideally.
  - get
- apiGroups:
  - cilium.io
  resources:
  - ciliumbgploadbalancerippools
  - ciliumbgppeeringpolicies
  - ciliumclusterwideenvoyconfigs
  - ciliumclusterwidenetworkpolicies
  - ciliumegressgatewaypolicies
  - ciliumegressnatpolicies
  - ciliumendpoints
  - ciliumendpointslices
  - ciliumenvoyconfigs
  - ciliumidentities
  - ciliumlocalredirectpolicies
  - ciliumnetworkpolicies
  - ciliumnodes
  verbs:
  - list
  - watch
- apiGroups:
  - cilium.io
  resources:
  - ciliumidentities
  - ciliumendpoints
  - ciliumnodes
  verbs:
  - create
- apiGroups:
  - cilium.io
  # To synchronize garbage collection of such resources
  resources:
  - ciliumidentities
  verbs:
  - update
- apiGroups:
  - cilium.io
  resources:
  - ciliumendpoints
  verbs:
  - delete
  - get
- apiGroups:
  - cilium.io
  resources:
  - ciliumnodes
  - ciliumnodes/status
  verbs:
  - get
  - update
- apiGroups:
  - cilium.io
  resources:
  - ciliumnetworkpolicies/status
  - ciliumclusterwidenetworkpolicies/status
  - ciliumendpoints/status
  - ciliumendpoints
  verbs:
  - patch
---
# Source: cilium/templates/cilium-operator/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cilium-operator
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - list
  - watch
  # to automatically delete [core|kube]dns pods so that are starting to being
  # managed by Cilium
  - delete
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  # To remove node taints
  - nodes
  # To set NetworkUnavailable false on startup
  - nodes/status
  verbs:
  - patch
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  # to perform LB IP allocation for BGP
  - services/status
  verbs:
  - update
- apiGroups:
  - ""
  resources:
  # to check apiserver connectivity
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  # to perform the translation of a CNP that contains `ToGroup` to its endpoints
  - services
  - endpoints
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - cilium.io
  resources:
  - ciliumnetworkpolicies
  - ciliumclusterwidenetworkpolicies
  verbs:
  # Create auto-generated CNPs and CCNPs from Policies that have 'toGroups'
  - create
  - update
  - deletecollection
  # To update the status of the CNPs and CCNPs
  - patch
  - get
  - list
  - watch
- apiGroups:
  - cilium.io
  resources:
  - ciliumnetworkpolicies/status
  - ciliumclusterwidenetworkpolicies/status
  verbs:
  # Update the auto-generated CNPs and CCNPs status.
  - patch
  - update
- apiGroups:
  - cilium.io
  resources:
  - ciliumendpoints
  - ciliumidentities
  verbs:
  # To perform garbage collection of such resources
  - delete
  - list
  - watch
- apiGroups:
  - cilium.io
  resources:
  - ciliumidentities
  verbs:
  # To synchronize garbage collection of such resources
  - update
- apiGroups:
  - cilium.io
  resources:
  - ciliumnodes
  verbs:
  - create
  - update
  - get
  - list
  - watch
    # To perform CiliumNode garbage collector
  - delete
- apiGroups:
  - cilium.io
  resources:
  - ciliumnodes/status
  verbs:
  - update
- apiGroups:
  - cilium.io
  resources:
  - ciliumendpointslices
  - ciliumenvoyconfigs
  verbs:
  - create
  - update
  - get
  - list
  - watch
  - delete
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - create
  - get
  - list
  - watch
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - update
  resourceNames:
  - ciliumbgploadbalancerippools.cilium.io
  - ciliumbgppeeringpolicies.cilium.io
  - ciliumclusterwideenvoyconfigs.cilium.io
  - ciliumclusterwidenetworkpolicies.cilium.io
  - ciliumegressgatewaypolicies.cilium.io
  - ciliumegressnatpolicies.cilium.io
  - ciliumendpoints.cilium.io
  - ciliumendpointslices.cilium.io
  - ciliumenvoyconfigs.cilium.io
  - ciliumexternalworkloads.cilium.io
  - ciliumidentities.cilium.io
  - ciliumlocalredirectpolicies.cilium.io
  - ciliumnetworkpolicies.cilium.io
  - ciliumnodes.cilium.io
# For cilium-operator running in HA mode.
#
# Cilium operator running in HA mode requires the use of ResourceLock for Leader Election
# between multiple running instances.
# The preferred way of doing this is to use LeasesResourceLock as edits to Leases are less
# common and fewer objects in the cluster watch "all Leases".
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - create
  - get
  - update
---
# Source: cilium/templates/hubble-ui/clusterrole.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: hubble-ui
rules:
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - componentstatuses
  - endpoints
  - namespaces
  - nodes
  - pods
  - services
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - cilium.io
  resources:
  - "*"
  verbs:
  - get
  - list
  - watch
---
# Source: cilium/templates/cilium-agent/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cilium
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cilium
subjects:
- kind: ServiceAccount
  name: "cilium"
  namespace: kube-system
---
# Source: cilium/templates/cilium-operator/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cilium-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cilium-operator
subjects:
- kind: ServiceAccount
  name: "cilium-operator"
  namespace: kube-system
---
# Source: cilium/templates/hubble-ui/clusterrolebinding.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: hubble-ui
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: hubble-ui
subjects:
- kind: ServiceAccount
  name: "hubble-ui"
  namespace: kube-system
---
# Source: cilium/templates/hubble-relay/service.yaml
kind: Service
apiVersion: v1
metadata:
  name: hubble-relay
  namespace: kube-system
  labels:
    k8s-app: hubble-relay
spec:
  type: "ClusterIP"
  selector:
    k8s-app: hubble-relay
  ports:
  - protocol: TCP
    port: 80
    targetPort: 4245
---
# Source: cilium/templates/hubble-ui/service.yaml
kind: Service
apiVersion: v1
metadata:
  name: hubble-ui
  namespace: kube-system
  labels:
    k8s-app: hubble-ui
spec:
  type: "ClusterIP"
  selector:
    k8s-app: hubble-ui
  ports:
    - name: http
      port: 80
      targetPort: 8081
---
# Source: cilium/templates/hubble/peer-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: hubble-peer
  namespace: kube-system
  labels:
    k8s-app: cilium
spec:
  selector:
    k8s-app: cilium
  ports:
  - name: peer-service
    port: 443
    protocol: TCP
    targetPort: 4244
  internalTrafficPolicy: Local
---
# Source: cilium/templates/cilium-agent/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cilium
  namespace: kube-system
  labels:
    k8s-app: cilium
spec:
  selector:
    matchLabels:
      k8s-app: cilium
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 2
    type: RollingUpdate
  template:
    metadata:
      annotations:
        # Set app AppArmor's profile to "unconfined". The value of this annotation
        # can be modified as long users know which profiles they have available
        # in AppArmor.
        container.apparmor.security.beta.kubernetes.io/cilium-agent: "unconfined"
        container.apparmor.security.beta.kubernetes.io/clean-cilium-state: "unconfined"
        container.apparmor.security.beta.kubernetes.io/mount-cgroup: "unconfined"
        container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites: "unconfined"
      labels:
        k8s-app: cilium
    spec:
      containers:
      - name: cilium-agent
        image: "quay.io/cilium/cilium:v1.12.2@sha256:986f8b04cfdb35cf714701e58e35da0ee63da2b8a048ab596ccb49de58d5ba36"
        imagePullPolicy: IfNotPresent
        command:
        - cilium-agent
        args:
        - --config-dir=/tmp/cilium/config-map
        startupProbe:
          httpGet:
            host: "127.0.0.1"
            path: /healthz
            port: 9879
            scheme: HTTP
            httpHeaders:
            - name: "brief"
              value: "true"
          failureThreshold: 105
          periodSeconds: 2
          successThreshold: 1
        livenessProbe:
          httpGet:
            host: "127.0.0.1"
            path: /healthz
            port: 9879
            scheme: HTTP
            httpHeaders:
            - name: "brief"
              value: "true"
          periodSeconds: 30
          successThreshold: 1
          failureThreshold: 10
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            host: "127.0.0.1"
            path: /healthz
            port: 9879
            scheme: HTTP
            httpHeaders:
            - name: "brief"
              value: "true"
          periodSeconds: 30
          successThreshold: 1
          failureThreshold: 3
          timeoutSeconds: 5
        env:
        - name: K8S_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CILIUM_K8S_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: CILIUM_CLUSTERMESH_CONFIG
          value: /var/lib/cilium/clustermesh/
        - name: CILIUM_CNI_CHAINING_MODE
          valueFrom:
            configMapKeyRef:
              name: cilium-config
              key: cni-chaining-mode
              optional: true
        - name: CILIUM_CUSTOM_CNI_CONF
          valueFrom:
            configMapKeyRef:
              name: cilium-config
              key: custom-cni-conf
              optional: true
        lifecycle:
          postStart:
            exec:
              command:
              - "/cni-install.sh"
              - "--enable-debug=false"
              - "--cni-exclusive=true"
              - "--log-file=/var/run/cilium/cilium-cni.log"
          preStop:
            exec:
              command:
              - /cni-uninstall.sh
        securityContext:
          seLinuxOptions:
            level: 's0'
            # Running with spc_t since we have removed the privileged mode.
            # Users can change it to a different type as long as they have the
            # type available on the system.
            type: 'spc_t'
          capabilities:
            add:
              # Use to set socket permission
              - CHOWN
              # Used to terminate envoy child process
              - KILL
              # Used since cilium modifies routing tables, etc...
              - NET_ADMIN
              # Used since cilium creates raw sockets, etc...
              - NET_RAW
              # Used since cilium monitor uses mmap
              - IPC_LOCK
              # Used in iptables. Consider removing once we are iptables-free
              - SYS_MODULE
              # We need it for now but might not need it for >= 5.11 specially
              # for the 'SYS_RESOURCE'.
              # In >= 5.8 there's already BPF and PERMON capabilities
              - SYS_ADMIN
              # Could be an alternative for the SYS_ADMIN for the RLIMIT_NPROC
              - SYS_RESOURCE
              # Both PERFMON and BPF requires kernel 5.8, container runtime
              # cri-o >= v1.22.0 or containerd >= v1.5.0.
              # If available, SYS_ADMIN can be removed.
              #- PERFMON
              #- BPF
              - DAC_OVERRIDE
              - FOWNER
              - SETGID
              - SETUID
            drop:
              - ALL
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        # Unprivileged containers need to mount /proc/sys/net from the host
        # to have write access
        - mountPath: /host/proc/sys/net
          name: host-proc-sys-net
        # Unprivileged containers need to mount /proc/sys/kernel from the host
        # to have write access
        - mountPath: /host/proc/sys/kernel
          name: host-proc-sys-kernel
        - name: bpf-maps
          mountPath: /sys/fs/bpf
          # Unprivileged containers can't set mount propagation to bidirectional
          # in this case we will mount the bpf fs from an init container that
          # is privileged and set the mount propagation from host to container
          # in Cilium.
          mountPropagation: HostToContainer
        - name: cilium-run
          mountPath: /var/run/cilium
        - name: cni-path
          mountPath: /host/opt/cni/bin
        - name: etc-cni-netd
          mountPath: /host/etc/cni/net.d
        - name: clustermesh-secrets
          mountPath: /var/lib/cilium/clustermesh
          readOnly: true
        - name: cilium-config-path
          mountPath: /tmp/cilium/config-map
          readOnly: true
          # Needed to be able to load kernel modules
        - name: lib-modules
          mountPath: /lib/modules
          readOnly: true
        - name: xtables-lock
          mountPath: /run/xtables.lock
        - name: hubble-tls
          mountPath: /var/lib/cilium/tls/hubble
          readOnly: true
      initContainers:
      # Required to mount cgroup2 filesystem on the underlying Kubernetes node.
      # We use nsenter command with host's cgroup and mount namespaces enabled.
      - name: mount-cgroup
        image: "quay.io/cilium/cilium:v1.12.2@sha256:986f8b04cfdb35cf714701e58e35da0ee63da2b8a048ab596ccb49de58d5ba36"
        imagePullPolicy: IfNotPresent
        env:
        - name: CGROUP_ROOT
          value: /run/cilium/cgroupv2
        - name: BIN_PATH
          value: /opt/cni/bin
        command:
        - sh
        - -ec
        # The statically linked Go program binary is invoked to avoid any
        # dependency on utilities like sh and mount that can be missing on certain
        # distros installed on the underlying host. Copy the binary to the
        # same directory where we install cilium cni plugin so that exec permissions
        # are available.
        - |
          cp /usr/bin/cilium-mount /hostbin/cilium-mount;
          nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT;
          rm /hostbin/cilium-mount
        volumeMounts:
        - name: hostproc
          mountPath: /hostproc
        - name: cni-path
          mountPath: /hostbin
        terminationMessagePolicy: FallbackToLogsOnError
        securityContext:
          seLinuxOptions:
            level: 's0'
            # Running with spc_t since we have removed the privileged mode.
            # Users can change it to a different type as long as they have the
            # type available on the system.
            type: 'spc_t'
          capabilities:
            drop:
              - ALL
            add:
              # Only used for 'mount' cgroup
              - SYS_ADMIN
              # Used for nsenter
              - SYS_CHROOT
              - SYS_PTRACE
      - name: apply-sysctl-overwrites
        image: "quay.io/cilium/cilium:v1.12.2@sha256:986f8b04cfdb35cf714701e58e35da0ee63da2b8a048ab596ccb49de58d5ba36"
        imagePullPolicy: IfNotPresent
        env:
        - name: BIN_PATH
          value: /opt/cni/bin
        command:
        - sh
        - -ec
        # The statically linked Go program binary is invoked to avoid any
        # dependency on utilities like sh that can be missing on certain
        # distros installed on the underlying host. Copy the binary to the
        # same directory where we install cilium cni plugin so that exec permissions
        # are available.
        - |
          cp /usr/bin/cilium-sysctlfix /hostbin/cilium-sysctlfix;
          nsenter --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-sysctlfix";
          rm /hostbin/cilium-sysctlfix
        volumeMounts:
        - name: hostproc
          mountPath: /hostproc
        - name: cni-path
          mountPath: /hostbin
        terminationMessagePolicy: FallbackToLogsOnError
        securityContext:
          seLinuxOptions:
            level: 's0'
            # Running with spc_t since we have removed the privileged mode.
            # Users can change it to a different type as long as they have the
            # type available on the system.
            type: 'spc_t'
          capabilities:
            drop:
              - ALL
            add:
              # Required in order to access host's /etc/sysctl.d dir
              - SYS_ADMIN
              # Used for nsenter
              - SYS_CHROOT
              - SYS_PTRACE
      # Mount the bpf fs if it is not mounted. We will perform this task
      # from a privileged container because the mount propagation bidirectional
      # only works from privileged containers.
      - name: mount-bpf-fs
        image: "quay.io/cilium/cilium:v1.12.2@sha256:986f8b04cfdb35cf714701e58e35da0ee63da2b8a048ab596ccb49de58d5ba36"
        imagePullPolicy: IfNotPresent
        args:
        - 'mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf'
        command:
        - /bin/bash
        - -c
        - --
        terminationMessagePolicy: FallbackToLogsOnError
        securityContext:
          privileged: true
        volumeMounts:
        - name: bpf-maps
          mountPath: /sys/fs/bpf
          mountPropagation: Bidirectional
      - name: clean-cilium-state
        image: "quay.io/cilium/cilium:v1.12.2@sha256:986f8b04cfdb35cf714701e58e35da0ee63da2b8a048ab596ccb49de58d5ba36"
        imagePullPolicy: IfNotPresent
        command:
        - /init-container.sh
        env:
        - name: CILIUM_ALL_STATE
          valueFrom:
            configMapKeyRef:
              name: cilium-config
              key: clean-cilium-state
              optional: true
        - name: CILIUM_BPF_STATE
          valueFrom:
            configMapKeyRef:
              name: cilium-config
              key: clean-cilium-bpf-state
              optional: true
        terminationMessagePolicy: FallbackToLogsOnError
        securityContext:
          seLinuxOptions:
            level: 's0'
            # Running with spc_t since we have removed the privileged mode.
            # Users can change it to a different type as long as they have the
            # type available on the system.
            type: 'spc_t'
          capabilities:
            # Most of the capabilities here are the same ones used in the
            # cilium-agent's container because this container can be used to
            # uninstall all Cilium resources, and therefore it is likely that
            # will need the same capabilities.
            add:
              # Used since cilium modifies routing tables, etc...
              - NET_ADMIN
              # Used in iptables. Consider removing once we are iptables-free
              - SYS_MODULE
              # We need it for now but might not need it for >= 5.11 specially
              # for the 'SYS_RESOURCE'.
              # In >= 5.8 there's already BPF and PERMON capabilities
              - SYS_ADMIN
              # Could be an alternative for the SYS_ADMIN for the RLIMIT_NPROC
              - SYS_RESOURCE
              # Both PERFMON and BPF requires kernel 5.8, container runtime
              # cri-o >= v1.22.0 or containerd >= v1.5.0.
              # If available, SYS_ADMIN can be removed.
              #- PERFMON
              #- BPF
            drop:
              - ALL
        volumeMounts:
        - name: bpf-maps
          mountPath: /sys/fs/bpf
          # Required to mount cgroup filesystem from the host to cilium agent pod
        - name: cilium-cgroup
          mountPath: /run/cilium/cgroupv2
          mountPropagation: HostToContainer
        - name: cilium-run
          mountPath: /var/run/cilium
        resources:
          requests:
            cpu: 100m
            memory: 100Mi # wait-for-kube-proxy
      restartPolicy: Always
      priorityClassName: system-node-critical
      serviceAccount: "cilium"
      serviceAccountName: "cilium"
      terminationGracePeriodSeconds: 1
      hostNetwork: true
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                k8s-app: cilium
            topologyKey: kubernetes.io/hostname
      nodeSelector:
        kubernetes.io/os: linux
      tolerations:
        - operator: Exists
      volumes:
        # To keep state between restarts / upgrades
      - name: cilium-run
        hostPath:
          path: /var/run/cilium
          type: DirectoryOrCreate
        # To keep state between restarts / upgrades for bpf maps
      - name: bpf-maps
        hostPath:
          path: /sys/fs/bpf
          type: DirectoryOrCreate
      # To mount cgroup2 filesystem on the host
      - name: hostproc
        hostPath:
          path: /proc
          type: Directory
      # To keep state between restarts / upgrades for cgroup2 filesystem
      - name: cilium-cgroup
        hostPath:
          path: /run/cilium/cgroupv2
          type: DirectoryOrCreate
      # To install cilium cni plugin in the host
      - name: cni-path
        hostPath:
          path:  /opt/cni/bin
          type: DirectoryOrCreate
        # To install cilium cni configuration in the host
      - name: etc-cni-netd
        hostPath:
          path: /etc/cni/net.d
          type: DirectoryOrCreate
        # To be able to load kernel modules
      - name: lib-modules
        hostPath:
          path: /lib/modules
        # To access iptables concurrently with other processes (e.g. kube-proxy)
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
        # To read the clustermesh configuration
      - name: clustermesh-secrets
        secret:
          secretName: cilium-clustermesh
          # note: the leading zero means this number is in octal representation: do not remove it
          defaultMode: 0400
          optional: true
        # To read the configuration from the config map
      - name: cilium-config-path
        configMap:
          name: cilium-config
      - name: host-proc-sys-net
        hostPath:
          path: /proc/sys/net
          type: Directory
      - name: host-proc-sys-kernel
        hostPath:
          path: /proc/sys/kernel
          type: Directory
      - name: hubble-tls
        projected:
          # note: the leading zero means this number is in octal representation: do not remove it
          defaultMode: 0400
          sources:
          - secret:
              name: hubble-server-certs
              optional: true
              items:
              - key: ca.crt
                path: client-ca.crt
              - key: tls.crt
                path: server.crt
              - key: tls.key
                path: server.key
---
# Source: cilium/templates/cilium-operator/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cilium-operator
  namespace: kube-system
  labels:
    io.cilium/app: operator
    name: cilium-operator
spec:
  # See docs on ServerCapabilities.LeasesResourceLock in file pkg/k8s/version/version.go
  # for more details.
  replicas: 2
  selector:
    matchLabels:
      io.cilium/app: operator
      name: cilium-operator
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
      labels:
        io.cilium/app: operator
        name: cilium-operator
    spec:
      containers:
      - name: cilium-operator
        image: quay.io/cilium/operator-generic:v1.12.2@sha256:00508f78dae5412161fa40ee30069c2802aef20f7bdd20e91423103ba8c0df6e
        imagePullPolicy: IfNotPresent
        command:
        - cilium-operator-generic
        args:
        - --config-dir=/tmp/cilium/config-map
        - --debug=$(CILIUM_DEBUG)
        env:
        - name: K8S_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CILIUM_K8S_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: CILIUM_DEBUG
          valueFrom:
            configMapKeyRef:
              key: debug
              name: cilium-config
              optional: true
        livenessProbe:
          httpGet:
            host: "127.0.0.1"
            path: /healthz
            port: 9234
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          timeoutSeconds: 3
        volumeMounts:
        - name: cilium-config-path
          mountPath: /tmp/cilium/config-map
          readOnly: true
        terminationMessagePolicy: FallbackToLogsOnError
      hostNetwork: true
      restartPolicy: Always
      priorityClassName: system-cluster-critical
      serviceAccount: "cilium-operator"
      serviceAccountName: "cilium-operator"
      # In HA mode, cilium-operator pods must not be scheduled on the same
      # node as they will clash with each other.
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                io.cilium/app: operator
            topologyKey: kubernetes.io/hostname
      nodeSelector:
        kubernetes.io/os: linux
      tolerations:
        - operator: Exists
      volumes:
        # To read the configuration from the config map
      - name: cilium-config-path
        configMap:
          name: cilium-config
---
# Source: cilium/templates/hubble-relay/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hubble-relay
  namespace: kube-system
  labels:
    k8s-app: hubble-relay
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: hubble-relay
  strategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
      labels:
        k8s-app: hubble-relay
    spec:
      containers:
        - name: hubble-relay
          image: "quay.io/cilium/hubble-relay:v1.12.2@sha256:6f3496c28f23542f2645d614c0a9e79e3b0ae2732080da794db41c33e4379e5c"
          imagePullPolicy: IfNotPresent
          command:
            - hubble-relay
          args:
            - serve
          ports:
            - name: grpc
              containerPort: 4245
          readinessProbe:
            tcpSocket:
              port: grpc
          livenessProbe:
            tcpSocket:
              port: grpc
          volumeMounts:
          - name: config
            mountPath: /etc/hubble-relay
            readOnly: true
          - name: tls
            mountPath: /var/lib/hubble-relay/tls
            readOnly: true
          terminationMessagePolicy: FallbackToLogsOnError
      restartPolicy: Always
      priorityClassName: 
      serviceAccount: "hubble-relay"
      serviceAccountName: "hubble-relay"
      automountServiceAccountToken: false
      terminationGracePeriodSeconds: 1
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                k8s-app: cilium
            topologyKey: kubernetes.io/hostname
      nodeSelector:
        kubernetes.io/os: linux
      volumes:
      - name: config
        configMap:
          name: hubble-relay-config
          items:
          - key: config.yaml
            path: config.yaml
      - name: tls
        projected:
          # note: the leading zero means this number is in octal representation: do not remove it
          defaultMode: 0400
          sources:
          - secret:
              name: hubble-relay-client-certs
              items:
                - key: ca.crt
                  path: hubble-server-ca.crt
                - key: tls.crt
                  path: client.crt
                - key: tls.key
                  path: client.key
---
# Source: cilium/templates/hubble-ui/deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: hubble-ui
  namespace: kube-system
  labels:
    k8s-app: hubble-ui
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: hubble-ui
  template:
    metadata:
      annotations:
      labels:
        k8s-app: hubble-ui
    spec:
      securityContext:
        fsGroup: 1001
        runAsGroup: 1001
        runAsUser: 1001
      priorityClassName: 
      serviceAccount: "hubble-ui"
      serviceAccountName: "hubble-ui"
      containers:
      - name: frontend
        image: "quay.io/cilium/hubble-ui:v0.9.2@sha256:d3596efc94a41c6b772b9afe6fe47c17417658956e04c3e2a28d293f2670663e"
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 8081
        volumeMounts:
          - name: hubble-ui-nginx-conf
            mountPath: /etc/nginx/conf.d/default.conf
            subPath: nginx.conf
          - name: tmp-dir
            mountPath: /tmp
        terminationMessagePolicy: FallbackToLogsOnError
      - name: backend
        image: "quay.io/cilium/hubble-ui-backend:v0.9.2@sha256:a3ac4d5b87889c9f7cc6323e86d3126b0d382933bd64f44382a92778b0cde5d7"
        imagePullPolicy: IfNotPresent
        env:
        - name: EVENTS_SERVER_PORT
          value: "8090"
        - name: FLOWS_API_ADDR
          value: "hubble-relay:80"
        ports:
        - name: grpc
          containerPort: 8090
        volumeMounts:
        terminationMessagePolicy: FallbackToLogsOnError
      nodeSelector:
        kubernetes.io/os: linux
      volumes:
      - configMap:
          defaultMode: 420
          name: hubble-ui-nginx
        name: hubble-ui-nginx-conf
      - emptyDir: {}
        name: tmp-dir

I think there is a networking issue or cilium config related issue? I am just not sure why cilium pods are not restarting in the control plane VM

thiDucTran commented 2 years ago

I even tried to create the cluster with no CNI (stopping at step https://cluster-api.sigs.k8s.io/user/quick-start.html#deploy-a-cni-solution)

After that I used both helm and cilium CLI method to install...same error....the error is mentioned in https://docs.cilium.io/en/stable/gettingstarted/kind/ ....but I am not understanding how my case is related to kind

level=info msg="  --operator-api-serve-addr='127.0.0.1:9234'" subsys=cilium-operator-generic
level=info msg="  --operator-prometheus-serve-addr=':9963'" subsys=cilium-operator-generic
level=info msg="  --parallel-alloc-workers='50'" subsys=cilium-operator-generic
level=info msg="  --pprof='false'" subsys=cilium-operator-generic
level=info msg="  --pprof-port='6061'" subsys=cilium-operator-generic
level=info msg="  --remove-cilium-node-taints='true'" subsys=cilium-operator-generic
level=info msg="  --set-cilium-is-up-condition='true'" subsys=cilium-operator-generic
level=info msg="  --skip-crd-creation='false'" subsys=cilium-operator-generic
level=info msg="  --subnet-ids-filter=''" subsys=cilium-operator-generic
level=info msg="  --subnet-tags-filter=''" subsys=cilium-operator-generic
level=info msg="  --synchronize-k8s-nodes='true'" subsys=cilium-operator-generic
level=info msg="  --synchronize-k8s-services='true'" subsys=cilium-operator-generic
level=info msg="  --unmanaged-pod-watcher-interval='15'" subsys=cilium-operator-generic
level=info msg="  --version='false'" subsys=cilium-operator-generic
level=info msg="Cilium Operator 1.12.2 c7516b9 2022-09-14T15:25:06+02:00 go version go1.18.6 linux/amd64" subsys=cilium-operator-generic
level=info msg="Starting apiserver on address 127.0.0.1:9234" subsys=cilium-operator-api
level=info msg="Establishing connection to apiserver" host="https://10.96.0.1:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://10.96.0.1:443" subsys=k8s
level=error msg="Unable to contact k8s api-server" error="Get \"https://10.96.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.96.0.1:443: i/o timeout" ipAddr="https://10.96.0.1:443" subsys=k8s
level=fatal msg="Unable to connect to Kubernetes apiserver" error="unable to create k8s client: unable to create k8s client: Get \"https://10.96.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.96.0.1:443: i/o timeout" subsys=cilium-operator-generic
thiDucTran commented 2 years ago

this seems to be an IP confict on our end when subnetting. sorry all

jackfrancis commented 2 years ago

No apologies necessary, perhaps this info will help someone else when searching through the historical issue queue, thanks for being thorough!

CecileRobertMichon commented 2 years ago

@thiDucTran thanks for keeping us posted! I'm planning on adding some instructions to docs on setting up Cilium and maybe an e2e test too

thiDucTran commented 2 years ago

hi for full clarity I changed the CIDR as below and cilium + hubble was installed fine via CRS (passing connectivity test etc) ...which is why I deduced it to being some IP conflict with another associate of mine that is also playing with CAPI using all defaults (not specifying CIDR blocks etc..)

In conjunction with CRS, I saw that when installing cilium using cilium CLI it also uses helm so I copied what cilium CLI was using to create my manifest files for CRS usage. helm template --namespace kube-system cilium cilium/cilium --version 1.12.2 --set hubble.relay.enabled=true --set hubble.ui.enabled=true --set azure.resourceGroup=om-rd-clusterapi-thi-stage,cluster.id=0,cluster.name=thi-test-cilium,encryption.nodeEncryption=false,kubeProxyReplacement=disabled,operator.replicas=1,serviceAccounts.cilium.name=cilium,serviceAccounts.operator.name=cilium-operator,tunnel=vxlan

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureCluster
metadata:
  name: thi-test-cilium
  namespace: default
spec:
  identityRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AzureClusterIdentity
    name: management-cluster-identity
  location: eastus
  networkSpec:
    vnet:
      name: vnet-cilium-stage
      cidrBlocks:
        - 10.0.0.0/16
    subnets:
      - name: subnet-control-plane-cilium-stage
        role: control-plane
        cidrBlocks:
          - 10.0.100.0/24
      - name: subnet-worker-node-cilium-stage
        role: node
        cidrBlocks:
          - 10.0.101.0/24