Open gilly42 opened 4 days ago
Hi! Could you please share your k0sctl config file? Also, how did you setup the external load balancer for the externalAddress?
/Sure, I testet different systems and environments. To debug I just took some fresh vms from hetzner (which I throw away after). On Hetzner I took debian 12 systems, an other systems I got ubuntu 22.04
(the ruleset files from above are from the ubuntu system, but they the same on the debian systems)
k0sctl.yml
apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: k0s-cluster
spec:
hosts:
- ssh:
address: 188.245.164.154
user: root
port: 22
keyPath: ~/.ssh/ctl
role: controller
- ssh:
address: 116.202.26.102
user: root
port: 22
keyPath: ~/.ssh/ctl
role: controller
- ssh:
address: 91.107.193.216
user: root
port: 22
keyPath: ~/.ssh/ctl
role: controller
- ssh:
address: 188.245.165.94
user: root
port: 22
keyPath: ~/.ssh/ctl
role: worker
- ssh:
address: 88.198.150.114
user: root
port: 22
keyPath: ~/.ssh/ctl
role: worker
k0s:
config:
spec:
api:
externalAddress: 188.245.165.100
sans:
- 188.245.165.100
HA Proxy installation (on debian12 systems) I testet also version 3 on the ubuntu systems - same behavior)
apt-get update
apt-get install haproxy=2.6.\*
haproxy config:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend kubeAPI
bind :6443
mode tcp
default_backend kubeAPI_backend
frontend konnectivity
bind :8132
mode tcp
default_backend konnectivity_backend
frontend controllerJoinAPI
bind :9443
mode tcp
default_backend controllerJoinAPI_backend
backend kubeAPI_backend
mode tcp
server k0s-controller1 188.245.164.154:6443 check check-ssl verify none
server k0s-controller2 116.202.26.102:6443 check check-ssl verify none
server k0s-controller3 91.107.193.216:6443 check check-ssl verify none
backend konnectivity_backend
mode tcp
server k0s-controller1 188.245.164.154:8132 check check-ssl verify none
server k0s-controller2 116.202.26.102:8132 check check-ssl verify none
server k0s-controller3 91.107.193.216:8132 check check-ssl verify none
backend controllerJoinAPI_backend
mode tcp
server k0s-controller1 188.245.164.154:9443 check check-ssl verify none
server k0s-controller2 116.202.26.102:9443 check check-ssl verify none
server k0s-controller3 91.107.193.216:9443 check check-ssl verify none
listen stats
bind *:9000
mode http
stats enable
stats uri /
And I also tried calicio as cni provider with more or less the same result, I think the change on the iptables are the problem, you have any Idea?
I playing with the Idea to just add the HA IP to the SAN section, so without the externalAdress. This seems to work and etcd got the other CPs as members. Is there any advantage to set the externalAdress?
dial tcp 10.96.0.1:443: connect: connection refused
This pretty much implies that kube-proxy has not been able to properly create the kubernetes.default
svc rules. So couple things to check:
/var/log/containers/....
k0s kc get svc,ep kubernetes -o yaml
, to see how does the endpoints look like, are they correct? With externalAddress
set, the ep should point to thatexternalAddress
set, kube-proxy should connect to that address. check k0s kc -n kube-system get cm kube-proxy
and that kubeconfig.conf
key points to correct addressThe major benefit of using externalAddress
is that k0s configures all the needed system components (kubelets, kube-proxy, ...) to connect to that address. Thus you get failover capability. In case it is not set, and using k0sctl, all those component would connect to the api only via one of the hosts, so if one goes down, all those components go down
One thing worth noting is that k0s also has a feature called NLLB that creates a local LB (of sorts) on all the workers and configures kubelet, kube-proxy etc to connect to API via it --> failover capability without having to setup HAProxy. Of course this does NOT solve the external, i.e. user, access failover
The endpoint and the kubeconfig.conf both have the HA IP on port 6443, this looks right for me. On the first control plane I got this:
k0s kc get svc,ep kubernetes -o yaml
root@k0s-cp:~# k0s kc get svc,ep kubernetes -o yaml
apiVersion: v1
items:
- apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2024-09-27T13:10:53Z"
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
resourceVersion: "195"
uid: 27a2d483-4df7-4f99-9561-f6d8903430cb
spec:
clusterIP: 10.96.0.1
clusterIPs:
- 10.96.0.1
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: https
port: 443
protocol: TCP
targetPort: 6443
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
- apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: "2024-09-27T13:10:53Z"
labels:
endpointslice.kubernetes.io/skip-mirror: "true"
name: kubernetes
namespace: default
resourceVersion: "12255"
uid: 09b6c97f-860f-418d-9d56-43f4960269f2
subsets:
- addresses:
- ip: 188.245.165.100
ports:
- name: https
port: 6443
protocol: TCP
kind: List
metadata:
resourceVersion: ""
k0s kc -n kube-system get cm kube-proxy -o yaml
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.244.0.0/16
configSyncPeriod: 0s
featureGates:
mode: "iptables"
conntrack:
maxPerCore: 0
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables: {"syncPeriod":"0s","minSyncPeriod":"0s"}
ipvs: {"syncPeriod":"0s","minSyncPeriod":"0s","tcpTimeout":"0s","tcpFinTimeout":"0s","udpTimeout":"0s"}
kind: KubeProxyConfiguration
metricsBindAddress: 0.0.0.0:10249
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
showHiddenMetricsForVersion: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://188.245.165.100:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
annotations:
k0s.k0sproject.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"config.conf":"apiVersion: kubeproxy.config.k8s.io/v1alpha1\nbindAddress: 0.0.0.0\nclientConnection:\n acceptContentTypes: \"\"\n burst: 0\n contentType: \"\"\n kubeconfig: /var/lib/kube-proxy/kubeconfig.conf\n qps: 0\nclusterCIDR: 10.244.0.0/16\nconfigSyncPeriod: 0s\nfeatureGates:\nmode: \"iptables\"\nconntrack:\n maxPerCore: 0\n min: null\n tcpCloseWaitTimeout: null\n tcpEstablishedTimeout: null\ndetectLocalMode: \"\"\nenableProfiling: false\nhealthzBindAddress: \"\"\nhostnameOverride: \"\"\niptables: {\"syncPeriod\":\"0s\",\"minSyncPeriod\":\"0s\"}\nipvs: {\"syncPeriod\":\"0s\",\"minSyncPeriod\":\"0s\",\"tcpTimeout\":\"0s\",\"tcpFinTimeout\":\"0s\",\"udpTimeout\":\"0s\"}\nkind: KubeProxyConfiguration\nmetricsBindAddress: 0.0.0.0:10249\nnodePortAddresses: null\noomScoreAdj: null\nportRange: \"\"\nshowHiddenMetricsForVersion: \"\"\nudpIdleTimeout: 0s\nwinkernel:\n enableDSR: false\n networkName: \"\"\n sourceVip: \"\"","kubeconfig.conf":"apiVersion: v1\nkind: Config\nclusters:\n- cluster:\n certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt\n server: https://188.245.165.100:6443\n name: default\ncontexts:\n- context:\n cluster: default\n namespace: default\n user: default\n name: default\ncurrent-context: default\nusers:\n- name: default\n user:\n tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token"},"kind":"ConfigMap","metadata":{"labels":{"app":"kube-proxy"},"name":"kube-proxy","namespace":"kube-system"}}
k0s.k0sproject.io/stack-checksum: 37c3ad792a0c4d89ee234cb32f5cf3f1
creationTimestamp: "2024-09-27T13:11:05Z"
labels:
app: kube-proxy
k0s.k0sproject.io/stack: kubeproxy
name: kube-proxy
namespace: kube-system
resourceVersion: "10986"
uid: ab2e2a08-397f-4886-8567-a5b1dad8ecb4
If I add the externalAddress
the kube-router
doesn't re-start, only the konnectivity-agent
, coredns
and kube-proxy
restarting, so the logs from the kube-proxy are still the ones without extnalAddress.
kube-proxy log on a fresh cluster with externalAddress looks like this
2024-09-27T13:47:35.473478688Z stderr F I0927 13:47:35.473340 1 server.go:511] "Using lenient decoding as strict decoding failed" err="strict decoding error: unknown field \"udpIdleTimeout\""
2024-09-27T13:47:35.485984293Z stderr F I0927 13:47:35.485860 1 server.go:1062] "Successfully retrieved node IP(s)" IPs=["188.245.165.94"]
2024-09-27T13:47:35.499843183Z stderr F I0927 13:47:35.499701 1 server.go:659] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"
2024-09-27T13:47:35.499860856Z stderr F I0927 13:47:35.499742 1 server_linux.go:165] "Using iptables Proxier"
2024-09-27T13:47:35.501936595Z stderr F I0927 13:47:35.501818 1 server_linux.go:511] "Detect-local-mode set to ClusterCIDR, but no cluster CIDR for family" ipFamily="IPv6"
2024-09-27T13:47:35.501945972Z stderr F I0927 13:47:35.501833 1 server_linux.go:528] "Defaulting to no-op detect-local"
2024-09-27T13:47:35.50194971Z stderr F I0927 13:47:35.501848 1 proxier.go:243] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
2024-09-27T13:47:35.502087148Z stderr F I0927 13:47:35.501972 1 server.go:872] "Version info" version="v1.30.4"
2024-09-27T13:47:35.502106103Z stderr F I0927 13:47:35.501986 1 server.go:874] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
2024-09-27T13:47:35.502806886Z stderr F I0927 13:47:35.502724 1 config.go:192] "Starting service config controller"
2024-09-27T13:47:35.502820672Z stderr F I0927 13:47:35.502740 1 shared_informer.go:313] Waiting for caches to sync for service config
2024-09-27T13:47:35.50282518Z stderr F I0927 13:47:35.502758 1 config.go:101] "Starting endpoint slice config controller"
2024-09-27T13:47:35.502829508Z stderr F I0927 13:47:35.502761 1 shared_informer.go:313] Waiting for caches to sync for endpoint slice config
2024-09-27T13:47:35.5031876Z stderr F I0927 13:47:35.503112 1 config.go:319] "Starting node config controller"
2024-09-27T13:47:35.50319832Z stderr F I0927 13:47:35.503122 1 shared_informer.go:313] Waiting for caches to sync for node config
2024-09-27T13:47:35.603810207Z stderr F I0927 13:47:35.603684 1 shared_informer.go:320] Caches are synced for node config
2024-09-27T13:47:35.603920253Z stderr F I0927 13:47:35.603742 1 shared_informer.go:320] Caches are synced for service config
2024-09-27T13:47:35.604064303Z stderr F I0927 13:47:35.603757 1 shared_informer.go:320] Caches are synced for endpoint slice config
looks the same if I create a new cluster without externalAddress
Hello,
I have the following problem, maybe I'm doing something wrong.
When I setup a HA proxy (debian12|2.6.12-1+deb12u1) and a k0s cluster (debian12|v1.30.4+k0s via k0sctl) as described in Control Plane High Availability and run it without adding the 'externalAddress' everything works great, I can access the control plane(s) via ha proxy, all good.
As soon as I add an 'externalAddress' in the config, the newly created kube-routers start to go into a CrashLoopBackOff. According to the logs, they can no longer access 10.96.0.1:443
...Failed to watch *v1.Pod: failed to list *v1.Pod: Get “https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: connect: connection refused
As far as I understand nft list ruleset correctly, adding the 'externalAddress' removes some rules that would otherwise manage the traffic to 10.96.0.1:443
I assume this is the reason why the kube-router does not work anymore. My question, is this a bug or am I doing something wrong, I don't see any further information in the manual.
ruleset_with_external.txt ruleset_without_external.txt