techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.41k stars 1.05k forks source link

Cilium installation error #446

Closed enmanuelmoreira closed 8 months ago

enmanuelmoreira commented 8 months ago

When I tried to install k3s+cilium I get the following error:

Install Cilium] **************************************************
fatal: [192.168.30.10]: FAILED! => {"changed": false, "cmd": ["cilium", "install", "--version", "v1.15.0", "--helm-set", "operator.replicas=1", "--helm-set", "devices=eth0", "--helm-set", "ipam.operator.clusterPoolIPv4PodCIDRList=10.43.0.0/16", "--helm-set", "ipv4NativeRoutingCIDR=10.43.0.0/16", "--helm-set", "k8sServiceHost=127.0.0.1", "--helm-set", "k8sServicePort=6444", "--helm-set", "routingMode=native", "--helm-set", "autoDirectNodeRoutes=true", "--helm-set", "kubeProxyReplacement=true", "--helm-set", "bpf.masquerade=true", "--helm-set", "bgpControlPlane.enabled=False", "--helm-set", "hubble.enabled=true", "--helm-set", "hubble.relay.enabled=true", "--helm-set", "hubble.ui.enabled=true", "--helm-set", "bpf.loadBalancer.algorithm=maglev", "--helm-set", "bpf.loadBalancer.mode=hybrid"], "delta": "0:00:00.073650", "end": "2024-02-07 00:19:30.112157", "msg": "non-zero return code", "rc": 1, "start": "2024-02-07 00:19:30.038507", "stderr": "\nError: Unable to install Cilium: Kubernetes cluster unreachable: Get \"http://localhost:8080/version\": dial tcp [::1]:8080: connect: connection refused", "stderr_lines": ["", "Error: Unable to install Cilium: Kubernetes cluster unreachable: Get \"http://localhost:8080/version\": dial tcp [::1]:8080: connect: connection refused"], "stdout": "ℹ️  Using Cilium version 1.15.0\n⏭️ Skipping auto kube-proxy detection", "stdout_lines": ["ℹ️  Using Cilium version 1.15.0", "⏭️ Skipping auto kube-proxy detection"]}

Inventory file:

---
k3s_version: v1.28.6+k3s2
# this is the user that has ssh access to these machines
ansible_user: k3s
systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "UTC"

# interface which will be used for flannel
# flannel_iface: "eth0"

# uncomment cilium_iface to use cilium cni instead of flannel or calico
# ensure v4.19.57, v5.1.16, v5.2.0 or more recent kernel
cilium_iface: "eth0"
cilium_mode: "native"        # native when nodes on same subnet or using bgp, else set routed
cilium_tag: "v1.15.0"        # cilium version tag
cilium_hubble: true          # enable hubble observability relay and ui

# if using calico or cilium, you may specify the cluster pod cidr pool
cluster_cidr: "10.43.0.0/16"

# enable cilium bgp control plane for lb services and pod cidrs. disables metallb.
cilium_bgp: false

# bgp parameters for cilium cni. only active when cilium_iface is defined and cilium_bgp is true.
cilium_bgp_my_asn: "64513"
cilium_bgp_peer_asn: "64512"
cilium_bgp_peer_address: "192.168.30.111"
cilium_bgp_lb_cidr: "192.168.30.0/24"   # cidr for cilium loadbalancer ipam

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.30.45"

# k3s_token is required  masters can talk together securely
# this token should be alpha numeric only
k3s_token: "omit"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: "{{ ansible_facts[(cilium_iface | default(calico_iface | default(flannel_iface)))]['ipv4']['address'] }}"

# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  {{ '--flannel-iface=' + flannel_iface if calico_iface is not defined and cilium_iface is not defined else '' }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
# the contents of the if block is also required if using calico or cilium
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  {% if calico_iface is defined or cilium_iface is defined %}
  --flannel-backend=none
  --disable-network-policy
  --cluster-cidr={{ cluster_cidr | default('10.43.0.0/16') }}
  {% endif %}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
  --write-kubeconfig-mode 644
  --kube-controller-manager-arg bind-address=0.0.0.0
  --kube-proxy-arg metrics-bind-address=0.0.0.0
  --kube-scheduler-arg bind-address=0.0.0.0
  --etcd-expose-metrics true
  --kubelet-arg containerd=/run/k3s/containerd/containerd.sock

extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.6.4"

Cilium status:

/¯¯\__/¯¯\    Cilium:             1 errors
\__/¯¯\__/    Operator:           1 errors
/¯¯\__/¯¯\    Envoy DaemonSet:    1 errors
\__/¯¯\__/    Hubble Relay:       1 warnings
\__/       ClusterMesh:        1 warnings

Cluster Pods:          0/0 managed by Cilium
Helm chart version:
Errors:                cilium-operator          cilium-operator          Get "http://localhost:8080/apis/apps/v1/namespaces/kube-system/deployments/cilium-operator": dial tcp [::1]:8080: connect: connection refused
cilium                   cilium                   Get "http://localhost:8080/apis/apps/v1/namespaces/kube-system/daemonsets/cilium": dial tcp [::1]:8080: connect: connection refused
cilium-envoy             cilium-envoy             Get "http://localhost:8080/apis/apps/v1/namespaces/kube-system/daemonsets/cilium-envoy": dial tcp [::1]:8080: connect: connection refused
Warnings:              hubble-relay             hubble-relay             hubble relay is not deployed
clustermesh-apiserver    clustermesh-apiserver    clustermesh is not deployed
hubble-ui                hubble-ui                hubble ui is not deployed
status check failed

Cilium connectivity test:

🟥 Unable to determine status of Cilium DaemonSet. Run "cilium status" for more details
connectivity test failed: unable to determine status of Cilium DaemonSet: Get "http://localhost:8080/apis/apps/v1/namespaces/kube-system/daemonsets/cilium": dial tcp [::1]:8080: connect: connection refused
timothystewart6 commented 8 months ago

We actually test cilium in CI so this might be something with your configuration?

enmanuelmoreira commented 8 months ago

My nodes (ubuntu 2204) are into a proxmox 8.1 server. Idk where to looking for...

timothystewart6 commented 8 months ago

because this is running fine in CI, I am going to move this to discussions https://github.com/techno-tim/k3s-ansible/actions/runs/7769871911/job/21210576578?pr=442