Closed ByteAlex closed 3 years ago
Crazy. Didn't have that yet. You can still a) use another network plugin (for me, flannel works fine) or b) completely ignore the Hetzner network routing and use e.g. IPIP/vxlan encapsulation (didn't try that yet on hcloud).
Not sure if related: When I used cilium in hcloud environments, it showed to me a huge number of restarts (couple of hundred restarts per week). That's the reason why I went away from cilium, could not figure out if it's cilium or a hcloud issue.
Just clarifying - In your issue #112 you were referring to the ccm-network documentation, which basically fully depends and takes use of the hetzner routing, but you still use another tunneling CNI?
Someone correct me if I am wrong, but if you use another tunneling CNI you can just go by the normal CCM w/o networks support, then you do not have to mess around with Cilium anyway if you're successfully using Flannel.
I think the issue happens to be on hcloud's end since someone at CIlium already looked into this issue, they said the routing magic in HCloud might be wrong.
You can basically do these different flavors:
I'm doing the third variant. You can to that with different plugins, e.g. cilium (native-routing-cidr), flannel (backend type "alloc"), cilium (no IPIP, no vxlan).
Okay got that, anyway I'm trying to figure out how to configure cilium/hcloud to create the correct routing tables in the cloud console. 👀
The cloud controller gets the values from k8s (or better said from the controlling CNI plugin). The cloud controller just adds these routes.
I‘m currently just at my mobile phone, so I can point you do a direct location, but if I remember correctly you need to set a specific cillium configuration, something with „blacklisted-routes“ (I try to find it, give me a few minutes).
Edit found the specific configuration lines: https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/44#issuecomment-652246804
The problem is that we won’t recommend or support a specific CNI. We just implement the spec given from k8s.
@LKaemmerling I guess you mean blacklist-conflicting-routes: "false"
?
edit: Nevermind, you just edited your comment.
I guess that many people will have similar questions to these... to have a more persistent solution than just tickets and comments:
@LKaemmerling, completely off topic: Maybe it's worth thinking about of linking this somewhere here.
The cloud controller gets the values from k8s (or better said from the controlling CNI plugin). The cloud controller just adds these routes.
I‘m currently just at my mobile phone, so I can point you do a direct location, but if I remember correctly you need to set a specific cillium configuration, something with „blacklisted-routes“ (I try to find it, give me a few minutes).
Edit found the specific configuration lines: #44 (comment)
The problem is that we won’t recommend or support a specific CNI. We just implement the spec given from k8s.
@LKaemmerling This sounds about right. I see routes being created as the cilium nodes allocate subnets, yet the routing seems to be wrong.
See this:
Spec:
Addresses:
Ip: 138.201.94.89
Type: ExternalIP
Ip: 10.0.0.2
Type: InternalIP
Ip: 10.224.2.153
Type: CiliumInternalIP
Azure:
Encryption:
Eni:
Health:
ipv4: 10.224.2.92
Ipam:
Pod CID Rs:
10.224.2.0/24
10.224.2.0/24
should be routed to 10.0.0.2
but is 10.0.0.4
in HCloud console.
Clarification: The IP allocation from cilium works fine, the cilium configuration seems fine, the only thing seems to be that the routing is wrong.
@ByteAlex could you try to use this cilium config? https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/e2etests/templates/cilium.yml
We use this config within out e2e tests to test the functionality of the whole networks feature. I just created a new setup with this config and it works fine:
root@srv-local-2580907693548082956:~# k describe node srv-local-2580907693548082956
Name: srv-local-2580907693548082956
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=cpx21
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=fsn1
failure-domain.beta.kubernetes.io/zone=fsn1-dc14
kubernetes.io/arch=amd64
kubernetes.io/hostname=srv-local-2580907693548082956
kubernetes.io/os=linux
node.kubernetes.io/instance-type=cpx21
topology.kubernetes.io/region=fsn1
topology.kubernetes.io/zone=fsn1-dc14
Annotations: io.cilium.network.ipv4-cilium-host: 10.244.0.123
io.cilium.network.ipv4-health-ip: 10.244.0.47
io.cilium.network.ipv4-pod-cidr: 10.244.0.0/24
CreationTimestamp: Thu, 12 Nov 2020 07:16:38 +0100
Taints: <none>
Unschedulable: false
Addresses:
Hostname: srv-local-2580907693548082956
ExternalIP: 168.119.154.106
InternalIP: 10.0.0.2
System Info:
Kernel Version: 5.4.0-52-generic
OS Image: Ubuntu 20.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.8
Kubelet Version: v1.19.4
Kube-Proxy Version: v1.19.4
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
ProviderID: hcloud://8493791
You can find the cluster configuration here: https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/e2etests/templates/cloudinit.txt.tpl#L13
I just compared the cloudinit.txt to my setup and it seems to be about the same. After the end of this script I just apply the ccm-network.yml for the hcloud ccm, then the cilium cni.
When I try to apply the template file you sent I get an error:
root@test-cluster-master-01:~# k apply -f https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/e2etests/templates/cilium.yml
error: error parsing https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/e2etests/templates/cilium.yml: error converting YAML to JSON: yaml: line 147: mapping values are not allowed in this context
Anyway in my latest test I got the correct routes (by accident?)
Addresses:
Hostname: test-cluster-master-01
ExternalIP: 138.201.94.89
InternalIP: 10.0.0.2
PodCIDR: 10.224.0.0/24
PodCIDRs: 10.224.0.0/24
ProviderID: hcloud://8446735
Addresses:
Hostname: test-cluster-worker-01
ExternalIP: 49.12.44.179
InternalIP: 10.0.0.3
PodCIDR: 10.224.1.0/24
PodCIDRs: 10.224.1.0/24
ProviderID: hcloud://8443725
Addresses:
Hostname: test-cluster-worker-02
ExternalIP: 138.201.93.167
InternalIP: 10.0.0.4
PodCIDR: 10.224.2.0/24
PodCIDRs: 10.224.2.0/24
ProviderID: hcloud://8443726
Yet still somewhat in the routing is wrong, when creating with the latest quick-install.yml from 1.9 with a modified configuration.
I restarted coredns and api-server pods, yet coredns still does not become ready as it can't reach kubernetes
service in the cluster.
root@test-cluster-master-01:~# k get pods -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system cilium-db2wd 1/1 Running 0 2m28s 10.0.0.4 test-cluster-worker-02 <none> <none>
kube-system cilium-operator-5d8498fc44-lkmsg 1/1 Running 0 2m28s 10.0.0.4 test-cluster-worker-02 <none> <none>
kube-system cilium-operator-5d8498fc44-skvdl 1/1 Running 0 2m28s 10.0.0.3 test-cluster-worker-01 <none> <none>
kube-system cilium-shkdm 1/1 Running 0 2m28s 10.0.0.3 test-cluster-worker-01 <none> <none>
kube-system cilium-vwrxm 1/1 Running 0 2m28s 10.0.0.2 test-cluster-master-01 <none> <none>
kube-system coredns-f9fd979d6-n6ls6 0/1 Running 0 67s 10.224.1.218 test-cluster-worker-01 <none> <none>
kube-system coredns-f9fd979d6-r6gpr 0/1 Running 0 67s 10.224.0.217 test-cluster-worker-02 <none> <none>
kube-system etcd-test-cluster-master-01 1/1 Running 0 19m 138.201.94.89 test-cluster-master-01 <none> <none>
kube-system hcloud-cloud-controller-manager-cb9c6698d-mmd97 1/1 Running 0 19m 138.201.94.89 test-cluster-master-01 <none> <none>
kube-system kube-apiserver-test-cluster-master-01 1/1 Running 0 18s 138.201.94.89 test-cluster-master-01 <none> <none>
kube-system kube-controller-manager-test-cluster-master-01 1/1 Running 0 19m 138.201.94.89 test-cluster-master-01 <none> <none>
kube-system kube-proxy-8bwc6 1/1 Running 0 17m 138.201.93.167 test-cluster-worker-02 <none> <none>
kube-system kube-proxy-tvs4h 1/1 Running 0 19m 138.201.94.89 test-cluster-master-01 <none> <none>
kube-system kube-proxy-wg54p 1/1 Running 0 17m 49.12.44.179 test-cluster-worker-01 <none> <none>
kube-system kube-scheduler-test-cluster-master-01 1/1 Running 0 19m 138.201.94.89 test-cluster-master-01 <none> <none>
Though from worker-01 I can reach kubernetes via curl
Any idea what could cause this?
Addtional information: coredns can reach kubernetes service when it's scheduled on the same node (master-01).
@ByteAlex i gave you the link to the github rendered file, not the raw file, this is why it failed. I talked with that about our DevOps and we both think that you misconfigured cilium. Have a look at the config file: (Now the raw applyable file): https://raw.githubusercontent.com/hetznercloud/hcloud-cloud-controller-manager/master/e2etests/templates/cilium.yml
@LKaemmerling thank you for clarifing! I was checking with the Cilium team before and they claimed everything is alright from their end, it would most likely be an issue on hetzner's side. Anyway I was not home for the past few days and will perform a check with the configuration you provided somewhat tonight or tomorrow.
The configuration I am/was using is this:
root@test-cluster-master-01:/opt/k8s/provisioning# cat cilium.yaml
---
# Source: cilium/templates/cilium-agent-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cilium
namespace: kube-system
---
# Source: cilium/templates/cilium-operator-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cilium-operator
namespace: kube-system
---
# Source: cilium/templates/cilium-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-config
namespace: kube-system
data:
# Identity allocation mode selects how identities are shared between cilium
# nodes by setting how they are stored. The options are "crd" or "kvstore".
# - "crd" stores identities in kubernetes as CRDs (custom resource definition).
# These can be queried with:
# kubectl get ciliumid
# - "kvstore" stores identities in a kvstore, etcd or consul, that is
# configured below. Cilium versions before 1.6 supported only the kvstore
# backend. Upgrades from these older cilium versions should continue using
# the kvstore by commenting out the identity-allocation-mode below, or
# setting it to "kvstore".
identity-allocation-mode: crd
cilium-endpoint-gc-interval: "5m0s"
# If you want to run cilium in debug mode change this value to true
debug: "false"
# Enable IPv4 addressing. If enabled, all endpoints are allocated an IPv4
# address.
enable-ipv4: "true"
# Enable IPv6 addressing. If enabled, all endpoints are allocated an IPv6
# address.
enable-ipv6: "false"
# Users who wish to specify their own custom CNI configuration file must set
# custom-cni-conf to "true", otherwise Cilium may overwrite the configuration.
custom-cni-conf: "false"
enable-bpf-clock-probe: "true"
# If you want cilium monitor to aggregate tracing for packets, set this level
# to "low", "medium", or "maximum". The higher the level, the less packets
# that will be seen in monitor output.
monitor-aggregation: medium
# The monitor aggregation interval governs the typical time between monitor
# notification events for each allowed connection.
#
# Only effective when monitor aggregation is set to "medium" or higher.
monitor-aggregation-interval: 5s
# The monitor aggregation flags determine which TCP flags which, upon the
# first observation, cause monitor notifications to be generated.
#
# Only effective when monitor aggregation is set to "medium" or higher.
monitor-aggregation-flags: all
# Specifies the ratio (0.0-1.0) of total system memory to use for dynamic
# sizing of the TCP CT, non-TCP CT, NAT and policy BPF maps.
bpf-map-dynamic-size-ratio: "0.0025"
# bpf-policy-map-max specifies the maximum number of entries in endpoint
# policy map (per endpoint)
bpf-policy-map-max: "16384"
# bpf-lb-map-max specifies the maximum number of entries in bpf lb service,
# backend and affinity maps.
bpf-lb-map-max: "65536"
# Pre-allocation of map entries allows per-packet latency to be reduced, at
# the expense of up-front memory allocation for the entries in the maps. The
# default value below will minimize memory usage in the default installation;
# users who are sensitive to latency may consider setting this to "true".
#
# This option was introduced in Cilium 1.4. Cilium 1.3 and earlier ignore
# this option and behave as though it is set to "true".
#
# If this value is modified, then during the next Cilium startup the restore
# of existing endpoints and tracking of ongoing connections may be disrupted.
# As a result, reply packets may be dropped and the load-balancing decisions
# for established connections may change.
#
# If this option is set to "false" during an upgrade from 1.3 or earlier to
# 1.4 or later, then it may cause one-time disruptions during the upgrade.
preallocate-bpf-maps: "false"
# Regular expression matching compatible Istio sidecar istio-proxy
# container image names
sidecar-istio-proxy-image: "cilium/istio_proxy"
# Encapsulation mode for communication between nodes
# Possible values:
# - disabled
# - vxlan (default)
# - geneve
tunnel: disabled
# Name of the cluster. Only relevant when building a mesh of clusters.
cluster-name: default
# Enables L7 proxy for L7 policy enforcement and visibility
enable-l7-proxy: "true"
# wait-bpf-mount makes init container wait until bpf filesystem is mounted
wait-bpf-mount: "false"
masquerade: "true"
enable-bpf-masquerade: "true"
enable-xt-socket-fallback: "true"
install-iptables-rules: "true"
auto-direct-node-routes: "false"
enable-bandwidth-manager: "false"
enable-local-redirect-policy: "false"
kube-proxy-replacement: "probe"
kube-proxy-replacement-healthz-bind-address: ""
enable-health-check-nodeport: "true"
node-port-bind-protection: "true"
enable-auto-protect-node-port-range: "true"
enable-session-affinity: "true"
enable-endpoint-health-checking: "true"
enable-health-checking: "true"
enable-well-known-identities: "false"
enable-remote-node-identity: "true"
operator-api-serve-addr: "127.0.0.1:9234"
# Enable Hubble gRPC service.
enable-hubble: "true"
# UNIX domain socket for Hubble server to listen to.
hubble-socket-path: "/var/run/cilium/hubble.sock"
ipam: "cluster-pool"
cluster-pool-ipv4-cidr: 10.224.0.0/16
cluster-pool-ipv4-mask-size: "24"
disable-cnp-status-updates: "true"
#Inserted Configuration
native-routing-cidr: 10.0.0.0/8
enable-endpoint-routes: "true"
---
# Source: cilium/templates/cilium-agent-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cilium
rules:
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- namespaces
- services
- nodes
- endpoints
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
- pods/finalizers
verbs:
- get
- list
- watch
- update
- delete
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- nodes
- nodes/status
verbs:
- patch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
# Deprecated for removal in v1.10
- create
- list
- watch
- update
# This is used when validating policies in preflight. This will need to stay
# until we figure out how to avoid "get" inside the preflight, and then
# should be removed ideally.
- get
- apiGroups:
- cilium.io
resources:
- ciliumnetworkpolicies
- ciliumnetworkpolicies/status
- ciliumnetworkpolicies/finalizers
- ciliumclusterwidenetworkpolicies
- ciliumclusterwidenetworkpolicies/status
- ciliumclusterwidenetworkpolicies/finalizers
- ciliumendpoints
- ciliumendpoints/status
- ciliumendpoints/finalizers
- ciliumnodes
- ciliumnodes/status
- ciliumnodes/finalizers
- ciliumidentities
- ciliumidentities/finalizers
- ciliumlocalredirectpolicies
- ciliumlocalredirectpolicies/status
- ciliumlocalredirectpolicies/finalizers
verbs:
- '*'
---
# Source: cilium/templates/cilium-operator-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cilium-operator
rules:
- apiGroups:
- ""
resources:
# to automatically delete [core|kube]dns pods so that are starting to being
# managed by Cilium
- pods
verbs:
- get
- list
- watch
- delete
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
# to perform the translation of a CNP that contains `ToGroup` to its endpoints
- services
- endpoints
# to check apiserver connectivity
- namespaces
verbs:
- get
- list
- watch
- apiGroups:
- cilium.io
resources:
- ciliumnetworkpolicies
- ciliumnetworkpolicies/status
- ciliumnetworkpolicies/finalizers
- ciliumclusterwidenetworkpolicies
- ciliumclusterwidenetworkpolicies/status
- ciliumclusterwidenetworkpolicies/finalizers
- ciliumendpoints
- ciliumendpoints/status
- ciliumendpoints/finalizers
- ciliumnodes
- ciliumnodes/status
- ciliumnodes/finalizers
- ciliumidentities
- ciliumidentities/status
- ciliumidentities/finalizers
- ciliumlocalredirectpolicies
- ciliumlocalredirectpolicies/status
- ciliumlocalredirectpolicies/finalizers
verbs:
- '*'
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- create
- get
- list
- update
- watch
# For cilium-operator running in HA mode.
#
# Cilium operator running in HA mode requires the use of ResourceLock for Leader Election
# between mulitple running instances.
# The preferred way of doing this is to use LeasesResourceLock as edits to Leases are less
# common and fewer objects in the cluster watch "all Leases".
# The support for leases was introduced in coordination.k8s.io/v1 during Kubernetes 1.14 release.
# In Cilium we currently don't support HA mode for K8s version < 1.14. This condition make sure
# that we only authorize access to leases resources in supported K8s versions.
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- get
- update
---
# Source: cilium/templates/cilium-agent-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cilium
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cilium
subjects:
- kind: ServiceAccount
name: cilium
namespace: kube-system
---
# Source: cilium/templates/cilium-operator-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cilium-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cilium-operator
subjects:
- kind: ServiceAccount
name: cilium-operator
namespace: kube-system
---
# Source: cilium/templates/cilium-agent-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
k8s-app: cilium
name: cilium
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: cilium
updateStrategy:
rollingUpdate:
maxUnavailable: 2
type: RollingUpdate
template:
metadata:
annotations:
# This annotation plus the CriticalAddonsOnly toleration makes
# cilium to be a critical pod in the cluster, which ensures cilium
# gets priority scheduling.
# https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
k8s-app: cilium
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- cilium
topologyKey: kubernetes.io/hostname
containers:
- args:
- --config-dir=/tmp/cilium/config-map
command:
- cilium-agent
livenessProbe:
httpGet:
host: '127.0.0.1'
path: /healthz
port: 9876
scheme: HTTP
httpHeaders:
- name: "brief"
value: "true"
failureThreshold: 10
# The initial delay for the liveness probe is intentionally large to
# avoid an endless kill & restart cycle if in the event that the initial
# bootstrapping takes longer than expected.
initialDelaySeconds: 120
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
httpGet:
host: '127.0.0.1'
path: /healthz
port: 9876
scheme: HTTP
httpHeaders:
- name: "brief"
value: "true"
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: CILIUM_FLANNEL_MASTER_DEVICE
valueFrom:
configMapKeyRef:
key: flannel-master-device
name: cilium-config
optional: true
- name: CILIUM_FLANNEL_UNINSTALL_ON_EXIT
valueFrom:
configMapKeyRef:
key: flannel-uninstall-on-exit
name: cilium-config
optional: true
- name: CILIUM_CLUSTERMESH_CONFIG
value: /var/lib/cilium/clustermesh/
- name: CILIUM_CNI_CHAINING_MODE
valueFrom:
configMapKeyRef:
key: cni-chaining-mode
name: cilium-config
optional: true
- name: CILIUM_CUSTOM_CNI_CONF
valueFrom:
configMapKeyRef:
key: custom-cni-conf
name: cilium-config
optional: true
image: quay.io/cilium/cilium:v1.9.0
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- "/cni-install.sh"
- "--enable-debug=false"
preStop:
exec:
command:
- /cni-uninstall.sh
name: cilium-agent
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_MODULE
privileged: true
volumeMounts:
- mountPath: /sys/fs/bpf
name: bpf-maps
- mountPath: /var/run/cilium
name: cilium-run
- mountPath: /host/opt/cni/bin
name: cni-path
- mountPath: /host/etc/cni/net.d
name: etc-cni-netd
- mountPath: /var/lib/cilium/clustermesh
name: clustermesh-secrets
readOnly: true
- mountPath: /tmp/cilium/config-map
name: cilium-config-path
readOnly: true
# Needed to be able to load kernel modules
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
hostNetwork: true
initContainers:
- command:
- /init-container.sh
env:
- name: CILIUM_ALL_STATE
valueFrom:
configMapKeyRef:
key: clean-cilium-state
name: cilium-config
optional: true
- name: CILIUM_BPF_STATE
valueFrom:
configMapKeyRef:
key: clean-cilium-bpf-state
name: cilium-config
optional: true
- name: CILIUM_WAIT_BPF_MOUNT
valueFrom:
configMapKeyRef:
key: wait-bpf-mount
name: cilium-config
optional: true
image: quay.io/cilium/cilium:v1.9.0
imagePullPolicy: IfNotPresent
name: clean-cilium-state
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
volumeMounts:
- mountPath: /sys/fs/bpf
name: bpf-maps
mountPropagation: HostToContainer
- mountPath: /var/run/cilium
name: cilium-run
resources:
requests:
cpu: 100m
memory: 100Mi
restartPolicy: Always
priorityClassName: system-node-critical
serviceAccount: cilium
serviceAccountName: cilium
terminationGracePeriodSeconds: 1
tolerations:
- operator: Exists
volumes:
# To keep state between restarts / upgrades
- hostPath:
path: /var/run/cilium
type: DirectoryOrCreate
name: cilium-run
# To keep state between restarts / upgrades for bpf maps
- hostPath:
path: /sys/fs/bpf
type: DirectoryOrCreate
name: bpf-maps
# To install cilium cni plugin in the host
- hostPath:
path: /opt/cni/bin
type: DirectoryOrCreate
name: cni-path
# To install cilium cni configuration in the host
- hostPath:
path: /etc/cni/net.d
type: DirectoryOrCreate
name: etc-cni-netd
# To be able to load kernel modules
- hostPath:
path: /lib/modules
name: lib-modules
# To access iptables concurrently with other processes (e.g. kube-proxy)
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
# To read the clustermesh configuration
- name: clustermesh-secrets
secret:
defaultMode: 420
optional: true
secretName: cilium-clustermesh
# To read the configuration from the config map
- configMap:
name: cilium-config
name: cilium-config-path
---
# Source: cilium/templates/cilium-operator-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
io.cilium/app: operator
name: cilium-operator
name: cilium-operator
namespace: kube-system
spec:
# We support HA mode only for Kubernetes version > 1.14
# See docs on ServerCapabilities.LeasesResourceLock in file pkg/k8s/version/version.go
# for more details.
replicas: 2
selector:
matchLabels:
io.cilium/app: operator
name: cilium-operator
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
labels:
io.cilium/app: operator
name: cilium-operator
spec:
# In HA mode, cilium-operator pods must not be scheduled on the same
# node as they will clash with each other.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: io.cilium/app
operator: In
values:
- operator
topologyKey: kubernetes.io/hostname
containers:
- args:
- --config-dir=/tmp/cilium/config-map
- --debug=$(CILIUM_DEBUG)
command:
- cilium-operator-generic
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: CILIUM_DEBUG
valueFrom:
configMapKeyRef:
key: debug
name: cilium-config
optional: true
image: quay.io/cilium/operator-generic:v1.9.0
imagePullPolicy: IfNotPresent
name: cilium-operator
livenessProbe:
httpGet:
host: '127.0.0.1'
path: /healthz
port: 9234
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 3
volumeMounts:
- mountPath: /tmp/cilium/config-map
name: cilium-config-path
readOnly: true
hostNetwork: true
restartPolicy: Always
priorityClassName: system-cluster-critical
serviceAccount: cilium-operator
serviceAccountName: cilium-operator
tolerations:
- operator: Exists
volumes:
# To read the configuration from the config map
- configMap:
name: cilium-config
name: cilium-config-path
@ByteAlex i gave you the link to the github rendered file, not the raw file, this is why it failed. I talked with that about our DevOps and we both think that you misconfigured cilium. Have a look at the config file: (Now the raw applyable file): https://raw.githubusercontent.com/hetznercloud/hcloud-cloud-controller-manager/master/e2etests/templates/cilium.yml
I've tried this config and it indeed does work, yet when trying the latest v1.9.0 on cilium it does not.
The following steps were done:
tunnel: disabled
masquerade: "true"
enable-endpoint-routes: "true"
auto-direct-node-routes: "false"
native-routing-cidr: "10.0.0.0/8"
as it is my hetzner networkcluster-pool-ipv4-cidr: "10.224.0.0/16"
so cilium's IPAM assignes addresses from kubernetes pod range properlyblacklist-conflicting-routes: "false"
even though that should no longer be needed in v1.9.0+Which results in the following difference: https://www.diffchecker.com/WWRn81qq
I also tried readding the deleted entries from L141-151, but that also didn't change the result: v1.9.0 does not work with hetzner using above configuration. Do you have another idea what could have went wrong there?
@ByteAlex as I said, it should be your configuration. cilium should not assign the addresses, this makes k8s. Cilium is basically just the one that put the data "on the wire" (maybe it encrypts it before).
The CNI plugs into Kubernetes IPAM https://github.com/containernetworking/cni/blob/spec-v0.4.0/SPEC.md When I try to start v1.9.0 without IPAM, the cilium containers don't start and stay in an error state.
Also I might misstated the "new" issue. The routes were created correctly (at least this time), but I can't establish an successful connection inside kube pods from different nodes.
But when trying to access a kubernetes service address it works when doing it from host.
Same curl from inside an kubernetes pod:
@LKaemmerling If you could spare a few more minutes on this - Could you please try deploying the v1.9.0 of Cilium to check whether that works for you too? I still can't get this working. Sorry for bothering.
@ByteAlex - if you didn't found the solution yet:
blacklist-conflicting-routes: "false"
:
enable-endpoint-routes: "true"
native-routing-cidr: "10.244.0.0/16"
=> Works for me :)
Sources:
@nupplaphil Yeah I figured that the config LKaemmerling posted worked, but not on the latest version. That's why I asked again. But I think I can go ahead and close this issue. Thanks for your assistance!
@ByteAlex I went through this topic when I wanted to init k8s cluster (v.1.20.0) on hetzner-cloud with cilium (1.9.4). I faced similar issues. What solved them for me was (1) taking care of setting the appropriate --node-ip (internal network ip) on each node (master+worker) for the kubelet as start argument (via kubelet.service.d kubelet-extra-arg) and (2) creating a subset of the general hetzner network for the subnet and another subset for the pod- and service-network. Sth like this: Network: 10.0.0.0/8; SUbnet: 10.1.0.0/16; Pod-Net: 10.2.0.0/16; Srv-Net: 10.3.0.0/16. Especially the appropriate setting of the networks was important. I'm not a networker but the masquerading seems to kill each attempt of seperating the subnet and the pod-/service-net into different domains (like 192... or similer).
@AlexMe99
thanks for the info. could you share the exact settings ? how did you create the hcloud network, which arguments did you pass to k3s master and worker and what cilium deployment did you use?
kind regards Philipp
If you're still curious, I solved it too.
This is my working cilium-file with cilium 1.9.5: https://github.com/nupplaphil/hcloud-k8s/blob/stable/roles/kube-master/files/cilium.yaml
But you have to keep an eye on your CIDRs at other places too (as @AlexMe99 already said), like https://github.com/nupplaphil/hcloud-k8s/blob/stable/roles/kube-master/files/hcloud-controller.yaml https://github.com/nupplaphil/hcloud-k8s/blob/f8ee5f18319ad3957a052603c84c4627d23a14e1/roles/kube-master/tasks/tasks.yaml#L6
@ByteAlex I guess you finally understood that the pod CIDR allocation done by Cilium for each node is random! For instance, on the first deployment node 1 gets 10.244.0.0/24, but on the second deployment, it gets 10.244.2.0/24. It doesn't matter. And the resulting routes picked up by Hetzner CCM are indeed correct.
Btw, for those like me using k3s, the default cluster CIDR is 10.42.0.0/16
. so that's what we pass to --cluster-cidr=
in the CCM config.
As for the service network, the default range is 10.43.0.0/16
for k3s, but from my understanding, it's a virtual IP range, so no config needed there, please correct me if I'm wrong.
@mysticaltech may I ask what linux distro did you deploy it to? I was able to successfully run it in Ubuntu 18.04
However, I can't get it running on Debian 10 or Ubuntu 20.04
hcloud-csi-controller-0
keeps crashing when everything else looks good.
It is the csi-provisioner
that keeps crashing.
Still new in this k8s - any direction will be much appreciated! 🙏
Hi @LKaemmerling , do you mind to shed some light to my struggle above? :(
Seems to work fine with 18.04, but the hcloud-csi-controller
won't work in Debian 10 nor Ubuntu 20.04
@kiwinesian Only yesterday, after reading the Cilium docs really well, and checking with "cilium status" I realized that cilium was using host-networking in legacy mode, so iptables, not BPF. It turns out, even the most recent version of Ubuntu is still on kernel 5.4, but cilium needs kernel's > 5.10 to activate its latest goodness and especially BPF host-networking.
So I will be using Fedora 34, which has kernel 5.12.
There is also the fact that in the my current sub-optimal setup, I can't reach services (using k3s). I figure that is probably because of kube-proxy not completely replaced. So will be following instructions here too https://docs.cilium.io/en/v1.10/gettingstarted/kubeproxy-free/.
Will post updates when I'm done, but please do the same @kiwinesian if you are able to work on this before, and don't hesitate if you have any questions. Even on my old setup, Hetzner CSI was starting fine with the following:
Network 10.0.0.0/8 Subnetwork 10.0.0.0/16 cluster CIDR / pod CIDR / ipv4 pool cidr (in cilium): 10.42.0.0/16 (for k3s) native routing CIDR (in cilium): 10.0.0.0/8 ipam (in cilium): (remove mention of it, so that it chooses the default of cluster-scope)
hi @mysticaltech
thanks so much for attending this!
interesting part is that Ubuntu 18.04 works completely fine with the iptables, but the csi-provisioner
is crashing in Ubuntu 20.04 (and Debian 10) using the same exact config for cilium.yaml
I can try the Fedora 34 one and see if I can get it going. Will have to rewrite the Ansible script to deploy all of this - will report back maybe after the weekend. :)
Awesome @kiwinesian, yes it would be better to use Fedora because it always has the latest and greatest kernel, and cilium seems to be relying on those for a lot of things. Also, it's better because the whole advantage of eBPF is to bypass iptables and such, it also does XDP acceleration at the network interface layer (don't ask me what that is exactly haha).
To deploy I use terraform inspired from https://github.com/StarpTech/k-andy, honestly as an Ansible user too, Terraform is a lot easier to use for deployment, even though both technologies have a lot of overlapping. What I like this that Hetzner maintains their own terraform provider.
For sure will share too if and when I get a decent enough setup, and probably before that even. Let's keep ourselves mutually posted ✌️
Hi @mysticaltech ,
Yes, for sure! Out of curiosity, have look into Calico?
Looks like they also have eBPF in the latest release.
Been thinking about that too as plan B, especially if they support native routing!
Performance wise, it looks like Calico is a bit lighter on the resource consumptions
On Sat., Jul. 3, 2021, 12:07 Karim Naufal, @.***> wrote:
Been thinking about that too as plan B, especially if they support native routing!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/115#issuecomment-873455970, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7F6IZGFWJTRJR755D2PKDTV5NXDANCNFSM4TRJFONQ .
@mysticaltech tested without the kube-proxy
Unfortunately, without installing kube-proxy
, no routes will be created hence many pods will fail to run/ created.
not automatically at least.
Thanks for sharing, I'm at the same place. It seems doable, but we're probably missing something in the config.
@kiwinesian Created a project for this https://github.com/mysticaltech/kube-hetzner, all works well, including full kube-proxy replacement. However, even though everything was setup with cilium for native routing, I had to use tunnel: geneve
see https://github.com/mysticaltech/kube-hetzner/blob/master/manifests/helm/cilium/values.yaml, to make everything really stable, somehow, pure native routing did not make the hetzner csi happy (maybe more debug is needed in the future). The geneve tunnel overhead is really low.
So thanks to cilium in combination with Fedora, we now have full BPF support, and full kube-proxy replacement with the improvement that it brings.
ohh thanks for sharing @mysticaltech ! I might find a weekend to spin the cluster up using your configuration ;)
I just managed to get the k8s cluster going, but there are a few things that I would like to validate with you and see if it makes sense/ ideal:
Native-Routing
. Instead, I have to set ipam=kubernetes
tunnel=disabled
and setting nativeRoutingCIDR=x.x.x.x/8
. I ended up leaving it enabled and seems to keeping it happy - even though the reference said to do so https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/docs/deploy_with_networks.md.I'm wondering if you are aware the impact of #1 and #2 leaving it as it is? should I attempt with tunnel=geneve
?
I guess you now know the answer @kiwinesian, but for others reading this, yes using tunnel=geneve
does work pretty well.
could someone explain, what would be the advantage of using hcloud native routing? thanks
ha, not quite sure what the actual advantage is @philipp1992
I always wonder myself if we are missing some benefits with not have native-routing
@philipp1992 @kiwinesian Native routing saves one layer of tunneling/vxlan. You should probably know why that is an advantage.
@AlexMe99 yours did it for me. Thanks. Network: 10.0.0.0/8; Subnet-Master: 10.2.0.0/24; Subnet-Worker: 10.3.0.0/24; Pod-Net: 10.4.0.0/16; Srv-Net: 10.5.0.0/16. Installed Cilium via Helm.
Hello,
I've been playing around with kubernetes 1.19 on hcloud for a bit now. Since the documentation about this is pretty old, I've been mostly trying to figure it on my own.
So my current setup: 1x Network / 10.0.0.0/8 1x LB (for a later HA setup of the control-planes, 10.0.0.5 here) 1x CPX11 (control-plane) 2x CPX11 (worker nodes)
Using kubeadm to setup the kubernetes cluster:
with the following variables: API_SERVER_CERT_EXTRA_SANS=10.0.0.1 CONTROL_PLANE_LB=10.0.0.5 KUBE_VERSION=v1.19.0 POD_NETWORK_CIDR=10.224.0.0/16
After that I copy the kube config and create the secrets for the hetzner ccm like this:
Followed by that I deploy the CCM-network:
The cloud controller goes ready, the nodes do have the hcloud://serverid in their describe.
Now I deploy the latest cilium with a few tweaked parameters:
Edit the quick-install.yml and ensure the following parameters:
Apply the deployment file.
Now the CNI is installed, coredns should start scheduling and the CCM creates routes for the nodes. So far so good, yet the created routes seem to be wrong for me.
Seeing here:
Yet the kubectl get pods -A -owide shows different ip distribution:
Where you can see:
Can someone please pinpoint me into the correct direction for resolving this issue?