Open mhkarimi1383 opened 7 months ago
Triage: what's the host OS/kernel version?
@jakubdyszkiewicz
OS: Debian GNU/Linux 12 (bookworm) Kernel: 6.1.0-18-amd64 Container Runtime: containerd://1.7.13 K8s: v1.28.6
I can supply my Cilium CNI variables/configs if needed
Could you provide Cilium CNI config? Also, did you set up it on some cloud-provided cluster(aws,azure,gke..) or by yourself(k3d/self-managed k8s/kind)?
Sure,
cilium_version: "v1.15.3"
cilium_debug: false
cilium_enable_ipv4: true
cilium_enable_ipv6: false
cilium_enable_bandwidth_manager: true
# Overlay Network Mode
cilium_tunnel_mode: "vxlan"
cilium_enable_prometheus: true
cilium_kube_proxy_replacement: strict
cilium_cluster_id: 1
cilium_native_routing_cidr: "10.0.0.0/8"
# cilium_encryption_enabled: false
cilium_enable_hubble: true
cilium_enable_hubble_metrics: true
cilium_hubble_metrics:
- dns
- drop
- tcp
- flow
- icmp
- http
cilium_hubble_install: true
cilium_hubble_tls_generate: true
cilium_operator_replicas: 2
cilium_config_extra_vars:
bpf-lb-sock-hostns-only: "true"
cilium_cluster_name: cluster-01
cilium_enable_ipv4_masquerade: true
cilium_enable_bpf_masquerade: true
cilium_enable_well_known_identities: true
I have my own private network and my cluster is installed using KubeSpray (Kubeadm) on my own Network/Hardware
I've managed to setup everything https://gist.github.com/lukidzi/c078c1b2be8d1cb0cfab4f28de51846c I couldn't start a cluster with your cilium setup because my CoreDNS on the worker node didn't start.
I didn't have the same issue but noticed other things
β time="2024-04-11T15:46:59Z" level=info msg="CNI config file /host/etc/cni/net.d/05-cilium.conflist exists. Proceeding." func="cni-server.getCNIConfigFilepath()" file="install.go:310" β
β time="2024-04-11T15:46:59Z" level=info msg="Created CNI config /host/etc/cni/net.d/05-cilium.conflist" func="cni-server.writeCNIConfig()" file="install.go:267" β
β time="2024-04-11T15:46:59Z" level=info msg="Restarting Merbridge CNI installer..." func="cni-server.(*Installer).Run()" file="install.go:113" β
β time="2024-04-11T15:47:00Z" level=info msg="Copied /app/merbridge-cni to /host/opt/cni/bin." func="cni-server.copyBinaries()" file="install.go:363" β
β time="2024-04-11T15:47:00Z" level=info msg="write kubeconfig file /host/etc/cni/net.d/ZZZ-merbridge-cni-kubeconfig with: \n# Kubeconfig file for Merbridge CNI plugin.\napiVersion: v1\nkind: Config\nclusters:\n- name: local\n β
β cluster:\n server: https://[10.233.0.1]:443\n insecure-skip-tls-verify: true\nusers:\n- name: merbridge-cni\n user:\n token: \"<redacted>\"\ncontexts:\n- name: merbridge-cni-context\n context:\n cluster: lo β
β cal\n user: merbridge-cni\ncurrent-context: merbridge-cni-context\n" func="cni-server.createKubeconfigFile()" file="install.go:453" β
β time="2024-04-11T15:47:00Z" level=info msg="CNI config file /host/etc/cni/net.d/05-cilium.conflist exists. Proceeding." func="cni-server.getCNIConfigFilepath()" file="install.go:310" β
β time="2024-04-11T15:47:00Z" level=info msg="Created CNI config /host/etc/cni/net.d/05-cilium.conflist" func="cni-server.writeCNIConfig()" file="install.go:267" β
β time="2024-04-11T15:47:00Z" level=info msg="Restarting Merbridge CNI installer..." func="cni-server.(*Installer).Run()" file="install.go:113" β
β time="2024-04-11T15:47:01Z" level=info msg="Copied /app/merbridge-cni to /host/opt/cni/bin." func="cni-server.copyBinaries()" file="install.go:363" β
β time="2024-04-11T15:47:01Z" level=info msg="write kubeconfig file /host/etc/cni/net.d/ZZZ-merbridge-cni-kubeconfig with: \n# Kubeconfig file for Merbridge CNI plugin.\napiVersion: v1\nkind: Config\nclusters:\n- name: local\n β
β cluster:\n server: https://[10.233.0.1]:443\n insecure-skip-tls-verify: true\nusers:\n- name: merbridge-cni\n user:\n token: \"<redacted>\"\ncontexts:\n- name: merbridge-cni-context\n context:\n cluster: lo β
β cal\n user: merbridge-cni\ncurrent-context: merbridge-cni-context\n" func="cni-server.createKubeconfigFile()" file="install.go:453" β
β time="2024-04-11T15:47:01Z" level=info msg="CNI config file /host/etc/cni/net.d/05-cilium.conflist exists. Proceeding." func="cni-server.getCNIConfigFilepath()" file="install.go:310" β
β time="2024-04-11T15:47:01Z" level=info msg="Created CNI config /host/etc/cni/net.d/05-cilium.conflist" func="cni-server.writeCNIConfig()" file="install.go:267" β
β time="2024-04-11T15:47:01Z" level=info msg="Invalid configuration. merbridge CNI config removed from CNI config file: /host/etc/cni/net.d/05-cilium.conflist" func="cni-server.sleepCheckInstall()" file="install.go:332"
apart from logs traffic seems to work
I've also created GKE cluster 1.28.7 with V2 network policy which uses Cilium and it works. Not a Cilium config expert maybe something is wrong with this configuration :/
For sure we need to review some error messages and verify how it works now.
@lukidzi Hi Thanks a lot for your work But I think some of my configs in cilium are not present in your configs
Also I have noticed that IP of the service is in a bigger range than K8s IPAM Range, can you also set?
cilium_ipam_mode: kubernetes
I think something could be wrong with my other variables, like
cilium_config_extra_vars:
bpf-lb-sock-hostns-only: "true"
I mean when using cilium's native ipam mode, It will use cilium_native_routing_cidr
, Also can you try disabling GCloud integration with Kubespray (if enabled)?
I am not a cilium specialist but with your configuration, my coreDNS couldn't start. There was a loop in a config and after fixing it I couldn't connect to the Kubernetes control-plane. To start everything I've used the official guide from the repository. I can't try all the cases because don't have access to the bare machine. I noticed that some things in ebpf require reviewing:
I mean when using cilium's native ipam mode, It will use cilium_native_routing_cidr, Also can you try disabling GCloud integration with Kubespray (if enabled)?
I think it's not enabled by default I've used
Tomorrow I will share my entire inventory (with sensitive data masked) with steps to run the cluster
@lukidzi
Hi, Sorry for being late, here is my entire kubespray inventory
runnning run.sh
script will bring up the cluster,
before running that ensure PRIVATEKEY_PATH
variable in prep.sh
is correct and servers information are correct in inventory/inventory.ini
@lukidzi Any updates?
Thank you for all the information, @mhkarimi1383. It seems this issue is more related to your configuration. We have accepted it because we believe that the documentation and some updates are required in these areas. We welcome and appreciate contributions from the community, so if youβre interested in helping out, weβd be grateful for your support.
@lukidzi Hi thanks I'm appropriated to contribute but can you give me steps to debug and find out where is the problem in my configurations
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.
What happened?
Helm Values:
Debug Log of CNI Pods:
I think volumes and mount paths are wrong in some environments (e.g. KubeSpray)
Cilium Volumes:
Cilium Volume Mounts: