istio / ztunnel

The `ztunnel` component of ambient mesh
Apache License 2.0
288 stars 96 forks source link

failed to connect to server "/var/run/ztunnel/ztunnel.sock" #1208

Closed piotrwasilewski420 closed 2 months ago

piotrwasilewski420 commented 2 months ago

I want to implement istio service mesh running in ambient mode. I installed ztunnel with helm chart with this command helm install ztunnel istio/ztunnel -n istio-system --wait and it deployed to the cluster but the ztunnel pods itself are not healthy because they fail to connect to a server. All information that I get from the logs is inpod::workloadmanager failed to connect to server "/var/run/ztunnel/ztunnel.sock": Os { code: 2, kind: NotFound, message: "No such file or directory" }. The most interesting thing is that it all works just fine when I install istio with istioctl install --set profile=ambient but we have already well established pipeline so obviously I want to install all components separately that is why I install everything via helm. I run k8s version 1.29.6 on my EKS cluster. Below logs from ztunnel pod image Why this socket isn't there when I install with helm but it is there when I use istioctl ?

klebiedzinski commented 2 months ago

I just got into same issue, bump!

hzxuzhonghu commented 2 months ago

/label area/ambient

ilrudie commented 2 months ago

Is the istio cni plugin running and healthy? It is a prerequisite for a working ztunnel and based on the logs it seems to be missing or malfunctioning. Istioctl with the ambient profile ensures the cni is installed and running but with helm this is its own step which must be completed first.

ilrudie commented 2 months ago

https://istio.io/latest/docs/ambient/install/helm-installation/ is the doc to follow when installing Istio ambient using helm

bleggett commented 2 months ago

Yep - to echo @ilrudie please follow our Helm installation guide and report back if you have issues. The Ambient Helm installation currently requires installing several components individually, of which ztunnel is just one.

We are trying to improve this with a simple wrapper chart that installs everything at once, so it's less confusing.

piotrwasilewski420 commented 2 months ago

@ilrudie @bleggett yes, the istio-cni is already up and running along with istiod and base helm chart that install other CRDs. That is why it is so confusing, cni looks healthy

bleggett commented 2 months ago

@piotrwasilewski420

If the socket isn't there, that likely means you did not install istio-cni with ambient enabled.

  1. Did you install istio-cni with the ambient profile as the docs indicate?

helm install istio-cni istio/cni -n istio-system --set profile=ambient --wait

  1. Can you check your istio-cni pod logs and confirm you see the following lines during startup?
ZtunnelUDSAddress: /var/run/ztunnel/ztunnel.sock
AmbientEnabled: true
piotrwasilewski420 commented 2 months ago

These are the logs from cni pod:

LabelKey: cni.istio.io/uninitialized
LabelValue: true
DeletePods: false
LabelPods: false
SidecarAnnotation: sidecar.istio.io/status
InitContainerName: istio-validation
InitTerminationMsg: 
InitExitCode: 126
LabelSelectors: 
FieldSelectors: 

2024-07-18T00:03:08.440037Z    info    Start a UDS server for CNI plugin logs
2024-07-18T00:03:08.441422Z    info    ControlZ available at 127.0.0.1:9876
2024-07-18T00:03:08.469246Z    info    ambient    HostIP=10.41.59.151
2024-07-18T00:03:08.494308Z    warn    ambient    unable to list IPSet: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:03:08.494378Z    info    ambient    Ambient enrolled IPs before reconciling: map]
2024-07-18T00:03:08.494776Z    info    ambient    Writing ambient config: {"ztunnelReady":false,"redirectMode":"iptables"}
2024-07-18T00:03:08.494875Z    error    ambient    Failed to write config file: open /etc/ambient-config/config.json.tmp.2326387490: no such file or directory
2024-07-18T00:03:08.494924Z    info    cluster "" kube client started
2024-07-18T00:03:08.511884Z    info    ambient    Namespace agentk is disabled from ambient mesh
2024-07-18T00:03:08.512240Z    info    ambient    Namespace app-user is disabled from ambient mesh
2024-07-18T00:03:08.512275Z    info    ambient    Namespace aqua is disabled from ambient mesh
2024-07-18T00:03:08.512283Z    info    ambient    Namespace blackbox is disabled from ambient mesh
2024-07-18T00:03:08.512293Z    info    ambient    Namespace blackbox-expoter is disabled from ambient mesh
2024-07-18T00:03:08.512300Z    info    ambient    Namespace certmanager is disabled from ambient mesh
2024-07-18T00:03:08.512309Z    info    ambient    Namespace default is disabled from ambient mesh
2024-07-18T00:03:08.512317Z    info    ambient    Namespace drawio is enabled in ambient mesh
2024-07-18T00:03:08.512326Z    info    ambient    Namespace falcon-system is disabled from ambient mesh
2024-07-18T00:03:08.512333Z    info    ambient    Namespace fluentbit is disabled from ambient mesh
2024-07-18T00:03:08.512342Z    info    ambient    Namespace gitlab-ci-pipelines-exporter is disabled from ambient mesh
2024-07-18T00:03:08.512349Z    info    ambient    Namespace gitlab-runner is disabled from ambient mesh
2024-07-18T00:03:08.512357Z    info    ambient    Namespace helm-version-change-notifier is disabled from ambient mesh
2024-07-18T00:03:08.512369Z    info    ambient    Namespace istio-system is disabled from ambient mesh
2024-07-18T00:03:08.512378Z    info    ambient    Namespace kroki is disabled from ambient mesh
2024-07-18T00:03:08.512385Z    info    ambient    Namespace kube-node-lease is disabled from ambient mesh
2024-07-18T00:03:08.512393Z    info    ambient    Namespace kube-public is disabled from ambient mesh
2024-07-18T00:03:08.512401Z    info    ambient    Namespace kube-system is disabled from ambient mesh
2024-07-18T00:03:08.512409Z    info    ambient    Namespace mail is disabled from ambient mesh
2024-07-18T00:03:08.512416Z    info    ambient    Namespace mail-relay is disabled from ambient mesh
2024-07-18T00:03:08.512425Z    info    ambient    Namespace mynamespace is disabled from ambient mesh
2024-07-18T00:03:08.512431Z    info    ambient    Namespace neuvector is disabled from ambient mesh
2024-07-18T00:03:08.512440Z    info    ambient    Namespace nginx-ingress is disabled from ambient mesh
2024-07-18T00:03:08.512488Z    info    ambient    Namespace plantuml is disabled from ambient mesh
2024-07-18T00:03:08.512500Z    info    ambient    Namespace test is disabled from ambient mesh
2024-07-18T00:03:08.512507Z    info    ambient    Namespace test-kp is disabled from ambient mesh
2024-07-18T00:03:08.512515Z    info    ambient    Namespace test-prom is disabled from ambient mesh
2024-07-18T00:03:08.512522Z    info    ambient    Namespace velero is disabled from ambient mesh
2024-07-18T00:03:08.597204Z    info    repair    Start CNI race condition repair.
2024-07-18T00:03:08.597238Z    info    controllers    starting    controller=ambient
2024-07-18T00:03:08.597470Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:03:08.599185Z    info    cluster "" kube client started
2024-07-18T00:03:08.612525Z    info    controllers    starting    controller=repair pods
2024-07-18T00:03:08.765257Z    info    install    Copied istio-cni to /host/opt/cni/bin.
2024-07-18T00:03:08.765307Z    info    install    Directory /host/secondary-bin-dir is not writable, skipping.
2024-07-18T00:03:08.766922Z    info    install    kubeconfig either does not exist or is out of date, writing a new one
2024-07-18T00:03:08.768031Z    info    install    wrote kubeconfig file /host/etc/cni/net.d/ZZZ-istio-cni-kubeconfig with: 
apiVersion: v1
clusters:
- cluster:
    certificate-authority: REDACTED
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJY0QvZGltcmF6Y013RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TkRBeU1qa3hOREE0TlRKYUZ3MHpOREF5TWpZeE5ERXpOVEphTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURCMHcxZzVBY3pCRERlNWRPZE9jK3lRdXByaW1pUkluR0lYK2FodXN4Q1VLRnU5TTc1dDNXYk9oUFgKVjE0RkdXcnBoM1Fyd0o4UU9nSk9ETGZ0V2x3NFpvZHlGYUtGMkxHRTl4b2JtNVFyQlRsMnIrZTNMZVo3QjBUdQp4b2xvdGg5eXU0NW9iTTM0Q2dGTTA4U3drdmtCeFY5U2xkaWpWNlBZVkt1VzY4MzJaSHhodUtVaWRzYXo5UzVZClJZSzdwallQVGEyOFZWMHFNa3dHdGh2SjBXTEhwNFJqV0NTN0crM09Ic3U0NTNsbFBOUjZubDFtOVhwcDFiSjcKb1Q3QWJURTNKbXNTTDRJWCt1QW43blZIbkZtYlJxNzRwZHBjWkN0QVVFK3VlOHEzNGx4bndlUXVXanVrSHBXRgpKVG0zdUczeEtRY3RZN2s0QUxqdWNQeUpVenlaQWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJSVHg4akczdjdJeXgvVmhWQ3VlTUhoeU5yMlZ6QVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQW5Bc21sd09kLwp4dkd1ZlViaW9mUGJMYTBYN1NXaGFmK2xkTjJ4YkM0b0ltS2FHNmxSWkx6eGY3MUV5VURIWDg4a2pjKzhiZG9XCkZMNU5qK0VVV1BEdHdtajhQM3Joekh4dWw0Qms4bmhTakF5ZWJsemN5VldvaXFKOTZmU3ZrMkQ2MkgxOTZEekwKMEkxTEE5d2hxelBUUlQ5UkNzSDZEVVlXOE1KQ1ZWeFhSQmRkR0NYUk9FVERiUGNVNm95K3dwQS8wbXZkYnM1OQpubGRnUlVYWlhCNy9tZmxPSGtYQW9EakRMalo3VHc4SnRvYXNwQ1I2eFlUcG1Yc1ZOdnVVRW11c3hzS0o3NzhHClRrNmRpY2pob1pIR0Y4U0FQeXBMZzVEdEpGRm1uZ1MyRmVldVptNUEyUG1kc3I1REl4aDAvbURZR2RZM0xPVHoKUHpzMFBuOWlYNjlCCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
    server: https://172.20.0.1:443
  name: local
contexts:
- context:
    cluster: local
    user: istio-cni
  name: istio-cni-context
current-context: istio-cni-context
kind: Config
preferences: {}
users:
- name: istio-cni
  user:
    token: REDACTED

2024-07-18T00:03:08.768541Z    info    install    missing (or invalid) configuration detected, (re)writing CNI config file at 
2024-07-18T00:03:08.768563Z    info    install    Using CNI config template from CNI_NETWORK_CONFIG environment variable.
2024-07-18T00:03:08.768580Z    info    install    CNI config: {
  "cniVersion": "0.3.1",
  "name": "istio-cni",
  "type": "istio-cni",
  "log_level": "debug",
  "log_uds_address": "/var/run/istio-cni/log.sock",
  "ambient_enabled": true,
  "cni_event_address": "__CNI_EVENT_ADDRESS__",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-istio-cni-kubeconfig",
      "cni_bin_dir": "/opt/cni/bin",
      "exclude_namespaces": [ "kube-system" ]
  }
}
2024-07-18T00:03:08.769295Z    info    install    CNI config file /host/etc/cni/net.d/10-aws.conflist exists. Proceeding.
2024-07-18T00:03:08.770974Z    info    install    Created CNI config /host/etc/cni/net.d/10-aws.conflist
2024-07-18T00:03:08.770993Z    info    install    Installation succeed, start watching for re-installation.
2024-07-18T00:03:08.771054Z    info    file watcher skipping watch on non-existent path: /host/secondary-bin-dir/istio-cni
2024-07-18T00:03:17.262600Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:03:17.262689Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:03:17.262700Z    info    ambient    Adding pod 'drawio-98599bcc5-4c7gc/drawio' (8efa7787-216c-45ce-bc26-ab4b1f2efc1d) to ipset
2024-07-18T00:03:17.262787Z    error    ambient    Failed to add pod drawio-98599bcc5-4c7gc to ipset list: failed to add IP 10.41.58.172 to ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:17.859721Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:18.400206Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:18.400300Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:18.400316Z    info    ambient    Adding pod 'drawio-98599bcc5-4c7gc/drawio' (8efa7787-216c-45ce-bc26-ab4b1f2efc1d) to ipset
2024-07-18T00:04:18.400401Z    error    ambient    Failed to add pod drawio-98599bcc5-4c7gc to ipset list: failed to add IP 10.41.58.172 to ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:18.413992Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:18.414091Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:18.414102Z    info    ambient    Adding pod 'drawio-98599bcc5-4c7gc/drawio' (8efa7787-216c-45ce-bc26-ab4b1f2efc1d) to ipset
2024-07-18T00:04:18.414183Z    error    ambient    Failed to add pod drawio-98599bcc5-4c7gc to ipset list: failed to add IP 10.41.58.172 to ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:18.425935Z    error    ambient    Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory
2024-07-18T00:04:18.425971Z    info    ambient    Pod 'drawio-98599bcc5-4c7gc/drawio' (8efa7787-216c-45ce-bc26-ab4b1f2efc1d) is not in ipset

on top of that I enabled ambient mode addingprofile: ambient` on top of my values file like that: image

bleggett commented 2 months ago

What versions of images and charts are you using here?

Those istio-cni logs don't look like they're from a current release.

piotrwasilewski420 commented 2 months ago

I use 1.20.8-distroless for install-cni

bleggett commented 2 months ago

I use 1.20.8-distroless for install-cni

You need to use the same version for both ztunnel and CNI, and I would strongly recommend you use the latest stable release, 1.22.3, for both.

piotrwasilewski420 commented 2 months ago

It worked, thanks a lot now they are all up and running