Closed byRoadrunner closed 8 months ago
I am getting the same error while simply trying to use cilium with this option:
cni_plugin = "cilium"
when I install for the first time with default settings I get the same error
Receiving the same error (see logs) when trying to create a new cluster but without backup_kustomization
. As far as I could track it down in the source, this might be related to the presence or absence1 of any values for the remote-exec
handler.
1the provisioner has dependencies to multiple values, but also available resources, e.g. load-balancer, or volumes. I could also recognize that some of the csi-nodes
seem to be stuck when being created which might be related to the provisioner hanging up. Expand the following section to see code snippet
```hcl depends_on = [ hcloud_load_balancer.cluster, null_resource.control_planes, random_password.rancher_bootstrap, hcloud_volume.longhorn_volume ] ```
module.kube-hetzner.null_resource.kustomization: Still creating... [6m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [6m10s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller
╷
│ Error: remote-exec provisioner error
│
│ with module.kube-hetzner.null_resource.kustomization,
│ on .terraform/modules/kube-hetzner/init.tf line 288, in resource "null_resource" "kustomization":
│ 288: provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_1207280219.sh": Process exited with status 1
Although this error occurs, it seems the resources are prepared and the cluster is reachable.
$ k get nodes
NAME STATUS ROLES AGE VERSION
training-shared-cluster-agent-large-bxf Ready <none> 28m v1.28.6+k3s2
training-shared-cluster-agent-large-cnb Ready <none> 28m v1.28.6+k3s2
training-shared-cluster-agent-large-ddd Ready <none> 28m v1.28.6+k3s2
training-shared-cluster-agent-large-iui Ready <none> 28m v1.28.6+k3s2
training-shared-cluster-agent-large-qtp Ready <none> 28m v1.28.6+k3s2
training-shared-cluster-agent-large-ric Ready <none> 28m v1.28.6+k3s2
training-shared-cluster-agent-large-rpr Ready <none> 28m v1.28.6+k3s2
training-shared-cluster-control-plane-fsn1-fdf Ready control-plane,etcd,master 28m v1.28.6+k3s2
training-shared-cluster-control-plane-fsn1-gvp Ready control-plane,etcd,master 27m v1.28.6+k3s2
training-shared-cluster-control-plane-fsn1-uli Ready control-plane,etcd,master 27m v1.28.6+k3s2
Other than that, I noticed that on each node there will be an instance of cilium with multiple restarts and 0/1
ready pods:
$ k get pods -n kube-system
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-9jp8p 1/1 Running 0 27m
kube-system cilium-chbwp 0/1 Running 9 (5m9s ago) 27m
kube-system cilium-czp8w 1/1 Running 0 27m
kube-system cilium-jv7cz 1/1 Running 0 27m
kube-system cilium-mcmft 0/1 Running 8 (33s ago) 27m
kube-system cilium-ns945 1/1 Running 0 27m
kube-system cilium-operator-f5dcdcc8d-prm4z 1/1 Running 0 27m
kube-system cilium-operator-f5dcdcc8d-wpf6n 1/1 Running 0 27m
kube-system cilium-qgtql 1/1 Running 0 27m
kube-system cilium-svjx2 0/1 Running 9 (5m32s ago) 27m
kube-system cilium-t9r7x 1/1 Running 0 27m
kube-system cilium-zsmkr 1/1 Running 0 27m
@M4t7e @Silvest89 Any ideas this issue?
@byRoadrunner What makes you think that the current implementation with Cilium has kube-proxy? My cluster is kube-proxy-free without the need to use --disable-kube-proxy
@mysticaltech Haven't had to bootstrap a cluster from scratch in a while :P If it happens even with default settings, it will need to be looked into
@Silvest89 At the last time when I checked it with this validation part (https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#validate-the-setup) I still got an iptables result. This led me to the conclusion that the cluster is still running with kube-proxy coexisting (https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#kube-proxy-hybrid-modes). If I made some mistake feel free to correct me, but I want the completely standalone mode, if that's possible.
@byRoadrunner There is no kube-proxy POD, so it means your system is kube-proxy free. Where did you execute the iptables the command? On your own computer? :P
@Silvest89 definitely not on my own computer 😉 It's some time ago since I tested so I can't remember the exact details, I will try to test again this evening. But did you check the iptables rules on your system?
@byRoadrunner It's not so long ago that @M4t7e did a rewrite of the cilium part. So yeah it is kube-proxy-free when using Cilium as CNI. No comments when making use of the other CNI's.
@Silvest89 Thanks for the clarifications, will have a look.
@kube-hetzner/core FYi, if you have any ideas.
@byRoadrunner It's not so long ago that @M4t7e did a rewrite of the cilium part. So yeah it is kube-proxy-free when using Cilium as CNI. No comments when making use of the other CNI's.
I definitely used the latest available version for this testing, which was, and still is, v2.11.8.
It was not about the kube proxy replacement not working. It was about the iptables rules still present which should not be the case if using full kube-proxy replacement mode.
iptables-save | grep KUBE-SVC
This still returned rules if I had a NodePort Service running on the cluster.
Anyways I will test this evening if I did a mistake or if I can still replicate this behaviour. As soon as I know more i will post a follow up 👍
@Silvest89 just to clarify, a standard installation with just changing the cni to cilium should be kube-proxy free?
@Silvest89 just to clarify, a standard installation with just changing the cni to cilium should be kube-proxy free?
Yes.
Now I'm getting the same error like before/like the others but with complete default installation (only cni set to cilium).
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller
EDIT: Ignore this, it was my fault, i forgot to increase the server_type from the defaults (which is needed for cilium)
So I just tested and when running iptables-save | grep KUBE-SVC
on the node on which the services are running I get the following:
k3s-agent-large-qtr:~ # iptables-save | grep KUBE-SVC
:KUBE-SVC-E3IBCFULSWKQCT47 - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-L65ENXXZWWSAPRCR - [0:0]
:KUBE-SVC-LODJXQNF3DWSNB7B - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-RY6ZSH2GAUYGLHMF - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
:KUBE-SVC-UIRTXPNS5NKAPNTY - [0:0]
:KUBE-SVC-UZ2GNDIHHRV7XITW - [0:0]
:KUBE-SVC-Z4ANX4WAEWEBLCTM - [0:0]
:KUBE-SVC-ZUD4L6KQKCHD52W4 - [0:0]
-A KUBE-EXT-L65ENXXZWWSAPRCR -j KUBE-SVC-L65ENXXZWWSAPRCR
-A KUBE-EXT-LODJXQNF3DWSNB7B -j KUBE-SVC-LODJXQNF3DWSNB7B
-A KUBE-EXT-UIRTXPNS5NKAPNTY -j KUBE-SVC-UIRTXPNS5NKAPNTY
-A KUBE-SERVICES -d 10.43.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -d 10.43.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES -d 10.43.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -d 10.43.188.115/32 -p tcp -m comment --comment "kube-system/hcloud-csi-controller-metrics:metrics cluster IP" -m tcp --dport 9189 -j KUBE-SVC-RY6ZSH2GAUYGLHMF
-A KUBE-SERVICES -d 10.43.33.175/32 -p tcp -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics cluster IP" -m tcp --dport 9189 -j KUBE-SVC-UZ2GNDIHHRV7XITW
-A KUBE-SERVICES -d 10.43.186.42/32 -p tcp -m comment --comment "traefik/traefik:web cluster IP" -m tcp --dport 80 -j KUBE-SVC-UIRTXPNS5NKAPNTY
-A KUBE-SERVICES -d 10.43.186.42/32 -p tcp -m comment --comment "traefik/traefik:websecure cluster IP" -m tcp --dport 443 -j KUBE-SVC-LODJXQNF3DWSNB7B
-A KUBE-SERVICES -d 10.43.132.82/32 -p tcp -m comment --comment "cert-manager/cert-manager-webhook:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-ZUD4L6KQKCHD52W4
-A KUBE-SERVICES -d 10.43.91.27/32 -p tcp -m comment --comment "cert-manager/cert-manager:tcp-prometheus-servicemonitor cluster IP" -m tcp --dport 9402 -j KUBE-SVC-E3IBCFULSWKQCT47
-A KUBE-SERVICES -d 10.43.33.192/32 -p tcp -m comment --comment "kube-system/metrics-server:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-Z4ANX4WAEWEBLCTM
-A KUBE-SERVICES -d 10.43.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -d 10.43.252.212/32 -p tcp -m comment --comment "default/my-nginx cluster IP" -m tcp --dport 80 -j KUBE-SVC-L65ENXXZWWSAPRCR
-A KUBE-SVC-E3IBCFULSWKQCT47 ! -s 10.42.0.0/16 -d 10.43.91.27/32 -p tcp -m comment --comment "cert-manager/cert-manager:tcp-prometheus-servicemonitor cluster IP" -m tcp --dport 9402 -j KUBE-MARK-MASQ
-A KUBE-SVC-E3IBCFULSWKQCT47 -m comment --comment "cert-manager/cert-manager:tcp-prometheus-servicemonitor -> 10.42.0.46:9402" -j KUBE-SEP-I2YMRF6X5XNTHNZY
-A KUBE-SVC-ERIFXISQEP7F7OF4 ! -s 10.42.0.0/16 -d 10.43.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp -> 10.42.5.127:53" -j KUBE-SEP-YJQITTE5EFTFYMQA
-A KUBE-SVC-JD5MR3NA4I4DYORP ! -s 10.42.0.0/16 -d 10.43.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics -> 10.42.5.127:9153" -j KUBE-SEP-UQZEKWHFV7M5EZRG
-A KUBE-SVC-L65ENXXZWWSAPRCR ! -s 10.42.0.0/16 -d 10.43.252.212/32 -p tcp -m comment --comment "default/my-nginx cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SVC-L65ENXXZWWSAPRCR -m comment --comment "default/my-nginx -> 10.42.0.29:80" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-7GYOAUIAMQUOKVDR
-A KUBE-SVC-L65ENXXZWWSAPRCR -m comment --comment "default/my-nginx -> 10.42.4.79:80" -j KUBE-SEP-IIS32MVGMNZKV3T6
-A KUBE-SVC-LODJXQNF3DWSNB7B ! -s 10.42.0.0/16 -d 10.43.186.42/32 -p tcp -m comment --comment "traefik/traefik:websecure cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-LODJXQNF3DWSNB7B -m comment --comment "traefik/traefik:websecure -> 10.42.0.145:8443" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-CB7E2YRDT4Z4QBTW
-A KUBE-SVC-LODJXQNF3DWSNB7B -m comment --comment "traefik/traefik:websecure -> 10.42.3.82:8443" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YIDA72PITHMP23YL
-A KUBE-SVC-LODJXQNF3DWSNB7B -m comment --comment "traefik/traefik:websecure -> 10.42.5.85:8443" -j KUBE-SEP-FBV2XYRHLSEUTYAL
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s 10.42.0.0/16 -d 10.43.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.253.0.101:6443" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-DYW6NS6B3Z6DCBYV
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.254.0.101:6443" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-C4RBGSB4EWYWEBUX
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.255.0.101:6443" -j KUBE-SEP-4VCRLYVIMEEKDPZE
-A KUBE-SVC-RY6ZSH2GAUYGLHMF ! -s 10.42.0.0/16 -d 10.43.188.115/32 -p tcp -m comment --comment "kube-system/hcloud-csi-controller-metrics:metrics cluster IP" -m tcp --dport 9189 -j KUBE-MARK-MASQ
-A KUBE-SVC-RY6ZSH2GAUYGLHMF -m comment --comment "kube-system/hcloud-csi-controller-metrics:metrics -> 10.42.5.54:9189" -j KUBE-SEP-66GUL3YV6P7YZ4PQ
-A KUBE-SVC-TCOU7JCQXEZGVUNU ! -s 10.42.0.0/16 -d 10.43.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns -> 10.42.5.127:53" -j KUBE-SEP-EOVD533DIZYTLRLX
-A KUBE-SVC-UIRTXPNS5NKAPNTY ! -s 10.42.0.0/16 -d 10.43.186.42/32 -p tcp -m comment --comment "traefik/traefik:web cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SVC-UIRTXPNS5NKAPNTY -m comment --comment "traefik/traefik:web -> 10.42.0.145:8000" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-7GIJ33LAD7V5FOMW
-A KUBE-SVC-UIRTXPNS5NKAPNTY -m comment --comment "traefik/traefik:web -> 10.42.3.82:8000" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-EF5ZOLPEVQ6KPKV5
-A KUBE-SVC-UIRTXPNS5NKAPNTY -m comment --comment "traefik/traefik:web -> 10.42.5.85:8000" -j KUBE-SEP-KEEKTLA2JCCDM4TY
-A KUBE-SVC-UZ2GNDIHHRV7XITW ! -s 10.42.0.0/16 -d 10.43.33.175/32 -p tcp -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics cluster IP" -m tcp --dport 9189 -j KUBE-MARK-MASQ
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.0.227:9189" -m statistic --mode random --probability 0.12500000000 -j KUBE-SEP-OTIHTLUJMKCO6SJF
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.1.23:9189" -m statistic --mode random --probability 0.14285714272 -j KUBE-SEP-JPR2HJKAN3EI7DKT
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.2.97:9189" -m statistic --mode random --probability 0.16666666651 -j KUBE-SEP-LGKW3HMYZ2EVIUFP
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.3.55:9189" -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-OWAFQ4MLCJKHS6VS
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.4.67:9189" -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-WXXIE2DBZOKGLXOH
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.5.96:9189" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-H6Q4WP6KZPHOWVUC
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.6.191:9189" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YL5OSEYGQQS3VPSO
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.7.160:9189" -j KUBE-SEP-N5CVT4WDXUVTBOM4
-A KUBE-SVC-Z4ANX4WAEWEBLCTM ! -s 10.42.0.0/16 -d 10.43.33.192/32 -p tcp -m comment --comment "kube-system/metrics-server:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-Z4ANX4WAEWEBLCTM -m comment --comment "kube-system/metrics-server:https -> 10.42.5.140:10250" -j KUBE-SEP-ELRRLVYKTAQW7W2Q
-A KUBE-SVC-ZUD4L6KQKCHD52W4 ! -s 10.42.0.0/16 -d 10.43.132.82/32 -p tcp -m comment --comment "cert-manager/cert-manager-webhook:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-ZUD4L6KQKCHD52W4 -m comment --comment "cert-manager/cert-manager-webhook:https -> 10.42.3.106:10250" -j KUBE-SEP-RTVYB7WCGOMJZGQE
But according to Cilium documentation (https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#validate-the-setup) this should just return an empty line.
@byRoadrunner How do you do the kube-proxy replacement? Via the cilium_values
var? It should be done there! See the cilium-values.yaml for the options available, you can find the link in kube.tf.example
.
Then look at the examples
section in the readme about how to get info about the cilium install, it should say kube-proxy free mode or something when you run that.
@mysticaltech kube-proxy replacement is already the default in the helm values which are deployed (https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/blob/e911232e73951ec873711a388b58770e10b8a80f/locals.tf#L383)
You are right about the part that it says KubeProxyReplacement: True
when entering cilium status. But thats not the problem I'm having.
In my previous comments i mentioned the hybrid mode and the validation steps provided by cilium, and if there are still KUBE-SVC
rules in ip-tables there is something not right. And then we would be running in hybrid mode, which I dont want.
Hey @byRoadrunner, you're right about your assumption that the current replacement is a hybrid solution. Kube Proxy is still running in background and manages a few functionalities.
I'm currently working on a new PR to update Cilium to the 1.15 release and I can include the full kube-proxy replacement as well. I already have a working setup, but I want to test a few more things. I'll probably file the PR tomorrow.
Great to hear from you @M4t7e 🙏
@M4t7e works like a charm, no more KUBE-SVC rules, thanks!
@Silvest89 just to clarify, a standard installation with just changing the cni to cilium should be kube-proxy free?
Yes.
@Silvest89 @M4t7e is this part of the README no longer true then:
Cilium supports full kube-proxy replacement. Cilium runs by default in hybrid kube-proxy replacement mode. To achieve a completely kube-proxy-free cluster, set disable_kube_proxy = true.
is this related? https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/1267
I am looking to upgrade an existing cluster from the current default flannel to Cilium and its a bit confusing what the config should be.
@maggie44 I added a comment to the issue: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/1267#issuecomment-2342645961
Cilium was not properly configured to take over the full kube-proxy functionality. If you don't craft your own cilium_values
, it should work by default. Otherwise, you must ensure the configuration is properly done on your own.
I am looking to upgrade an existing cluster from the current default flannel to Cilium and its a bit confusing what the config should be.
If you want to replace the kube-proxy with Cilium by setting disable_kube_proxy = true
, then Cilium needs this configured to take over full kube proxy functionality:
k8sServiceHost: "127.0.0.1"
k8sServicePort: "6444"
This is already the default configuration used when you do not specify custom cilium_values
: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/blob/da24fd260b060038630a72b121f06288bc6b8e56/locals.tf#L453-L455
If further information is required, I will be happy to provide it.
Discussed in https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/1199