hetznercloud / hcloud-cloud-controller-manager

Kubernetes cloud-controller-manager for Hetzner Cloud
Apache License 2.0
703 stars 112 forks source link

Deploy on k3s seems not work #154

Closed kladiv closed 3 years ago

kladiv commented 3 years ago

Hello, i found issue setup k3s v1.20.0+k3s2 on Hetzner Cloud with hcloud-cloud-controller-manager. Below steps applied:

root@infra1:~# export K3S_TOKEN="k3stest"
root@infra1:~# export INSTALL_K3S_VERSION=v1.20.0+k3s2
root@infra1:~# curl -sfL https://get.k3s.io | sh -s - --cluster-init --disable servicelb --disable traefik --disable-cloud-controller --kubelet-arg="cloud-provider=external" --data-dir /app/k3s --flannel-backend none --disable-network-policy
[INFO]  Using v1.20.0+k3s2 as release
[INFO]  Downloading hash https://github.com/rancher/k3s/releases/download/v1.20.0+k3s2/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/rancher/k3s/releases/download/v1.20.0+k3s2/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
root@infra1:~#

root@infra1:~# kubectl get nodes -o wide
NAME     STATUS   ROLES                       AGE   VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
infra1   Ready    control-plane,etcd,master   3s    v1.20.0+k3s2   78.47.171.100   <none>        Ubuntu 20.04.2 LTS   5.4.0-65-generic   containerd://1.4.3-k3s1

root@infra1:~# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   coredns-854c77959c-rsspj                  0/1     Pending   0          10s
kube-system   local-path-provisioner-7c458769fb-7ld5l   0/1     Pending   0          10s
kube-system   metrics-server-86cbb8457f-8zk6n           0/1     Pending   0          10s

root@infra1:~# kubectl taint nodes infra1 node.cloudprovider.kubernetes.io/uninitialized:NoSchedule-
node/infra1 untainted

root@infra1:~# kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
poddisruptionbudget.policy/calico-kube-controllers created
root@infra1:~#

root@infra1:~# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-86bddfcff-n4xm8   1/1     Running   0          2m2s
kube-system   calico-node-26jm7                         1/1     Running   0          2m2s
kube-system   coredns-854c77959c-rsspj                  1/1     Running   0          6m28s
kube-system   local-path-provisioner-7c458769fb-7ld5l   1/1     Running   0          6m28s
kube-system   metrics-server-86cbb8457f-8zk6n           1/1     Running   0          6m28s

root@infra1:~# kubectl -n kube-system create secret generic hcloud --from-literal=token=WkSaw7qHcP9NQJ3Bz2ZK40xIOEHtepAF9dAdO4W5hKmryKYeeXLpbuEkb1SfNWW4
secret/hcloud created

root@infra1:~# kubectl apply -f  https://raw.githubusercontent.com/hetznercloud/hcloud-cloud-controller-manager/master/deploy/ccm.yaml
serviceaccount/cloud-controller-manager created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
clusterrolebinding.rbac.authorization.k8s.io/system:cloud-controller-manager created
deployment.apps/hcloud-cloud-controller-manager created
root@infra1:~#

root@infra1:~# kubectl -n kube-system logs -f hcloud-cloud-controller-manager-7cf4f5974-c66c5
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0213 17:38:10.471695       1 serving.go:313] Generated self-signed cert in-memory
W0213 17:38:10.956727       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0213 17:38:10.959150       1 controllermanager.go:120] Version: v0.0.0-master+$Format:%h$
I0213 17:38:10.959204       1 cloud.go:90] hcloud/newCloud: HCLOUD_NETWORK empty
Hetzner Cloud k8s cloud controller v1.8.1 started
W0213 17:38:11.243146       1 controllermanager.go:132] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0213 17:38:11.244711       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0213 17:38:11.244756       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0213 17:38:11.244815       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0213 17:38:11.244826       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0213 17:38:11.245503       1 secure_serving.go:178] Serving securely on [::]:10258
I0213 17:38:11.245996       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0213 17:38:11.247622       1 node_controller.go:110] Sending events to api server.
I0213 17:38:11.247751       1 controllermanager.go:247] Started "cloud-node"
I0213 17:38:11.249119       1 node_lifecycle_controller.go:78] Sending events to api server
I0213 17:38:11.249172       1 controllermanager.go:247] Started "cloud-node-lifecycle"
I0213 17:38:11.250649       1 controllermanager.go:247] Started "service"
I0213 17:38:11.250669       1 core.go:101] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true.
W0213 17:38:11.250678       1 controllermanager.go:244] Skipping "route"
I0213 17:38:11.250768       1 controller.go:208] Starting service controller
I0213 17:38:11.250808       1 shared_informer.go:223] Waiting for caches to sync for service
I0213 17:38:11.345020       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0213 17:38:11.345101       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0213 17:38:11.350983       1 shared_informer.go:230] Caches are synced for service
E0213 17:38:11.528290       1 node_controller.go:237] hcloud/instances.InstanceExistsByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: 10037937

I tested on k3s 1.19 too with same error (with Calico CNI or Flannel as well). I know about Issue #80 , but kubelet is started using"cloud-provider=external" argument. Below log entries from /var/log/syslog:

...
Feb 13 18:16:12 infra1 k3s[34736]: time="2021-02-13T18:16:12.379847821+01:00" level=info msg="Handling backend connection request [infra1]"
Feb 13 18:16:12 infra1 k3s[34736]: time="2021-02-13T18:16:12.381420199+01:00" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/var/lib/rancher/k3s/data/b9574be94e4edbdbb93a39a2cb1f4e4df3ba699171a8b86863d1e8c421c91f63/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=/run/k3s/containerd/containerd.sock --container-runtime=remote --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=infra1 --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-labels= --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
Feb 13 18:16:12 infra1 k3s[34736]: time="2021-02-13T18:16:12.382833764+01:00" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --healthz-bind-address=127.0.0.1 --hostname-override=infra1 --kubeconfig=/var/lib/rancher/k3s/agent/kubeproxy.kubeconfig --proxy-mode=iptables"
...

Something else to check?

Thank you

LKaemmerling commented 3 years ago

Can you describe the node? If the cluster was correctly created with the cloud-provider=external flag, you should be able to see ProviderID in the node describe output.

kladiv commented 3 years ago

Hi @LKaemmerling i applied again all step above. Here the describe node:

root@infra1:/var/log# kubectl describe node infra1
Name:               infra1
Roles:              control-plane,etcd,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=infra1
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/etcd=true
                    node-role.kubernetes.io/master=true
Annotations:        etcd.k3s.cattle.io/node-address: 78.47.171.100
                    etcd.k3s.cattle.io/node-name: infra1-ed558d16
                    k3s.io/node-args:
                      ["server","--cluster-init","--disable","servicelb","--disable","traefik","--disable-cloud-controller","--kubelet-arg","cloud-provider=exte...
                    k3s.io/node-config-hash: 6KDUWVJKCTEIA4TLPDLIQ7HHGJV7CAHUZ5MQZ6J6GNTWEVRVA6RA====
                    k3s.io/node-env:
                      {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/986d5e8cf570f904598f9a5d531da2430e5a6171d22b7addb1e4a7c5b87a47d0","K3S_TOKEN":"********"}
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.123.1.11/32
                    projectcalico.org/IPv4IPIPTunnelAddr: 172.16.138.64
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 15 Feb 2021 10:18:34 +0100
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  infra1
  AcquireTime:     <unset>
  RenewTime:       Mon, 15 Feb 2021 10:22:35 +0100
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 15 Feb 2021 10:19:19 +0100   Mon, 15 Feb 2021 10:19:19 +0100   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Mon, 15 Feb 2021 10:22:15 +0100   Mon, 15 Feb 2021 10:18:34 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 15 Feb 2021 10:22:15 +0100   Mon, 15 Feb 2021 10:18:34 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 15 Feb 2021 10:22:15 +0100   Mon, 15 Feb 2021 10:18:34 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 15 Feb 2021 10:22:15 +0100   Mon, 15 Feb 2021 10:20:14 +0100   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  Hostname:    infra1
  ExternalIP:  78.47.171.100
Capacity:
  cpu:                2
  ephemeral-storage:  39245320Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3932284Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  38177847267
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3932284Ki
  pods:               110
System Info:
  Machine ID:                 575504e3d4be4f2ea06dd3214ed18b64
  System UUID:                575504e3-d4be-4f2e-a06d-d3214ed18b64
  Boot ID:                    9e592757-217f-4bb7-bed9-e0cb4b2df1a8
  Kernel Version:             5.4.0-65-generic
  OS Image:                   Ubuntu 20.04.2 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.4.3-k3s1
  Kubelet Version:            v1.20.0+k3s2
  Kube-Proxy Version:         v1.20.0+k3s2
PodCIDR:                      10.42.0.0/24
PodCIDRs:                     10.42.0.0/24
Non-terminated Pods:          (6 in total)
  Namespace                   Name                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                               ------------  ----------  ---------------  -------------  ---
  kube-system                 calico-kube-controllers-86bddfcff-78rpl            0 (0%)        0 (0%)      0 (0%)           0 (0%)         3m43s
  kube-system                 calico-node-5vphs                                  250m (12%)    0 (0%)      0 (0%)           0 (0%)         3m43s
  kube-system                 coredns-854c77959c-m8nl6                           100m (5%)     0 (0%)      70Mi (1%)        170Mi (4%)     4m6s
  kube-system                 hcloud-cloud-controller-manager-7cf4f5974-j544r    100m (5%)     0 (0%)      50Mi (1%)        0 (0%)         59s
  kube-system                 local-path-provisioner-7c458769fb-k959s            0 (0%)        0 (0%)      0 (0%)           0 (0%)         4m6s
  kube-system                 metrics-server-86cbb8457f-nwcwg                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         4m6s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                450m (22%)  0 (0%)
  memory             120Mi (3%)  170Mi (4%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age    From        Message
  ----     ------                   ----   ----        -------
  Normal   Starting                 4m9s   kubelet     Starting kubelet.
  Warning  InvalidDiskCapacity      4m9s   kubelet     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  4m9s   kubelet     Node infra1 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    4m9s   kubelet     Node infra1 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     4m9s   kubelet     Node infra1 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  4m9s   kubelet     Updated Node Allocatable limit across pods
  Normal   NodeReady                4m7s   kubelet     Node infra1 status is now: NodeReady
  Normal   Starting                 4m6s   kube-proxy  Starting kube-proxy.
  Normal   Starting                 2m34s  kube-proxy  Starting kube-proxy.
  Normal   Starting                 2m29s  kubelet     Starting kubelet.
  Warning  InvalidDiskCapacity      2m29s  kubelet     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  2m29s  kubelet     Node infra1 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    2m29s  kubelet     Node infra1 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     2m29s  kubelet     Node infra1 status is now: NodeHasSufficientPID
  Normal   NodeNotReady             2m29s  kubelet     Node infra1 status is now: NodeNotReady
  Normal   NodeAllocatableEnforced  2m29s  kubelet     Updated Node Allocatable limit across pods
  Normal   NodeReady                2m29s  kubelet     Node infra1 status is now: NodeReady
root@infra1:/var/log#
LKaemmerling commented 3 years ago

@kladiv your cluster was not initialized with cloud-provider=external. (Missing ProviderID in your node description).

I just tested it with our test setup (https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/e2etests/templates/cloudinit_k3s.txt.tpl)

And it works fine:

root@srv-cluster-node-local-3699175778297686350:~# k describe node srv-cluster-node-local-3699175778297686350
Name:               srv-cluster-node-local-3699175778297686350
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=cpx21
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=hel1
                    failure-domain.beta.kubernetes.io/zone=hel1-dc2
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=srv-cluster-node-local-3699175778297686350
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node.kubernetes.io/instance-type=cpx21
                    topology.kubernetes.io/region=hel1
                    topology.kubernetes.io/zone=hel1-dc2
Annotations:        io.cilium.network.ipv4-cilium-host: 10.42.0.42
                    io.cilium.network.ipv4-health-ip: 10.42.0.19
                    io.cilium.network.ipv4-pod-cidr: 10.42.0.0/24
                    k3s.io/node-args:
                      ["server","--disable","servicelb","--disable","traefik","--disable-cloud-controller","--kubelet-arg","cloud-provider=external","--no-flann...
                    k3s.io/node-config-hash: UE6D33J2ITXAMRAUJZSRZFBJYL7MCLRPEXESF5QF3QHURU4MNR4Q====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/8f4b194129852507eab4a55117fc942e0688ec9a70ffdaa5911ccc6652220f76"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 15 Feb 2021 10:40:06 +0100
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  srv-cluster-node-local-3699175778297686350
  AcquireTime:     <unset>
  RenewTime:       Mon, 15 Feb 2021 10:41:16 +0100
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 15 Feb 2021 10:41:03 +0100   Mon, 15 Feb 2021 10:41:03 +0100   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Mon, 15 Feb 2021 10:41:06 +0100   Mon, 15 Feb 2021 10:40:06 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 15 Feb 2021 10:41:06 +0100   Mon, 15 Feb 2021 10:40:06 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 15 Feb 2021 10:41:06 +0100   Mon, 15 Feb 2021 10:40:06 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 15 Feb 2021 10:41:06 +0100   Mon, 15 Feb 2021 10:41:06 +0100   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  Hostname:    srv-cluster-node-local-3699175778297686350
  ExternalIP:  135.181.144.244
  InternalIP:  10.0.0.2
Capacity:
  cpu:                3
  ephemeral-storage:  78620712Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3932068Ki
  pods:               110
Allocatable:
  cpu:                3
  ephemeral-storage:  76482228574
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3932068Ki
  pods:               110
System Info:
  Machine ID:                 1709caa07e0342b2b4a2e5f5d828c5b2
  System UUID:                1709caa0-7e03-42b2-b4a2-e5f5d828c5b2
  Boot ID:                    8b389b16-fd0c-4270-bafc-79ce8adf21bd
  Kernel Version:             5.4.0-65-generic
  OS Image:                   Ubuntu 20.04.2 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.4.3-k3s1
  Kubelet Version:            v1.20.2+k3s1
  Kube-Proxy Version:         v1.20.2+k3s1
PodCIDR:                      10.42.0.0/24
PodCIDRs:                     10.42.0.0/24
ProviderID:                   hcloud://10071257
Non-terminated Pods:          (6 in total)
  Namespace                   Name                                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                                ------------  ----------  ---------------  -------------  ---
  kube-system                 hcloud-cloud-controller-manager-69667b4dc6-7vx55    100m (3%)     0 (0%)      50Mi (1%)        0 (0%)         49s
  kube-system                 cilium-978lq                                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         50s
  kube-system                 coredns-854c77959c-cthv2                            100m (3%)     0 (0%)      70Mi (1%)        170Mi (4%)     58s
  kube-system                 local-path-provisioner-7c458769fb-7455f             0 (0%)        0 (0%)      0 (0%)           0 (0%)         58s
  kube-system                 cilium-operator-84488457d4-6c9b7                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         50s
  kube-system                 metrics-server-86cbb8457f-lmfbf                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         58s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                200m (6%)   0 (0%)
  memory             120Mi (3%)  170Mi (4%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age   From        Message
  ----     ------                   ----  ----        -------
  Normal   Starting                 72s   kubelet     Starting kubelet.
  Warning  InvalidDiskCapacity      72s   kubelet     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  72s   kubelet     Node srv-cluster-node-local-3699175778297686350 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    72s   kubelet     Node srv-cluster-node-local-3699175778297686350 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     72s   kubelet     Node srv-cluster-node-local-3699175778297686350 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  72s   kubelet     Updated Node Allocatable limit across pods
  Normal   Starting                 70s   kube-proxy  Starting kube-proxy.
  Normal   NodeReady                12s   kubelet     Node srv-cluster-node-local-3699175778297686350 status is now: NodeReady
kladiv commented 3 years ago

Hi @LKaemmerling, i reset and re-setup using your settings, but without Hetzner Networks.

root@infra1:~# export INSTALL_K3S_VERSION=v1.20.0+k3s2
root@infra1:~# curl -sfL https://get.k3s.io | sh -s - --disable servicelb --disable traefik --disable-cloud-controller --kubelet-arg="cloud-provider=external" --no-flannel
[INFO]  Using v1.20.0+k3s2 as release
[INFO]  Downloading hash https://github.com/rancher/k3s/releases/download/v1.20.0+k3s2/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/rancher/k3s/releases/download/v1.20.0+k3s2/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
root@infra1:~#

root@infra1:~# kubectl get nodes -o wide
NAME     STATUS   ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
infra1   Ready    control-plane,master   22s   v1.20.0+k3s2   78.47.171.100   <none>        Ubuntu 20.04.2 LTS   5.4.0-65-generic   containerd://1.4.3-k3s1

root@infra1:~# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   coredns-854c77959c-9qr6t                  0/1     Pending   0          39s
kube-system   metrics-server-86cbb8457f-9cq9h           0/1     Pending   0          39s
kube-system   local-path-provisioner-7c458769fb-xk7pv   0/1     Pending   0          39s

root@infra1:~# kubectl taint nodes infra1 node.cloudprovider.kubernetes.io/uninitialized:NoSchedule-
node/infra1 untainted
root@infra1:~#

root@infra1:~# kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
poddisruptionbudget.policy/calico-kube-controllers created
root@infra1:~#

root@infra1:~# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   calico-node-x27wd                         1/1     Running   0          48s
kube-system   local-path-provisioner-7c458769fb-xk7pv   1/1     Running   0          3m45s
kube-system   metrics-server-86cbb8457f-9cq9h           1/1     Running   0          3m45s
kube-system   coredns-854c77959c-9qr6t                  1/1     Running   0          3m45s
kube-system   calico-kube-controllers-86bddfcff-lg429   1/1     Running   0          48s
root@infra1:~#

root@infra1:~# kubectl -n kube-system create secret generic hcloud --from-literal=token=WkSaw7qHcP9NQJ3Bz2ZK40xIOEHtepAF9dAdO4W5hKmryKYeeXLpbuEkb1SfNWW4
secret/hcloud created
root@infra1:~#

root@infra1:~# kubectl apply -f  https://raw.githubusercontent.com/hetznercloud/hcloud-cloud-controller-manager/master/deploy/ccm.yaml
serviceaccount/cloud-controller-manager created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
clusterrolebinding.rbac.authorization.k8s.io/system:cloud-controller-manager created
deployment.apps/hcloud-cloud-controller-manager created
root@infra1:~#

root@infra1:~# kubectl -n kube-system logs -f hcloud-cloud-controller-manager-7cf4f5974-h4cqk
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0215 09:50:38.286211       1 serving.go:313] Generated self-signed cert in-memory
W0215 09:50:38.483926       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0215 09:50:38.486277       1 controllermanager.go:120] Version: v0.0.0-master+$Format:%h$
I0215 09:50:38.486314       1 cloud.go:90] hcloud/newCloud: HCLOUD_NETWORK empty
Hetzner Cloud k8s cloud controller v1.8.1 started
W0215 09:50:38.753177       1 controllermanager.go:132] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0215 09:50:38.753866       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0215 09:50:38.753956       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0215 09:50:38.754092       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0215 09:50:38.754109       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0215 09:50:38.754123       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0215 09:50:38.754110       1 secure_serving.go:178] Serving securely on [::]:10258
I0215 09:50:38.755329       1 node_lifecycle_controller.go:78] Sending events to api server
I0215 09:50:38.755378       1 controllermanager.go:247] Started "cloud-node-lifecycle"
I0215 09:50:38.756214       1 controllermanager.go:247] Started "service"
I0215 09:50:38.756221       1 core.go:101] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true.
W0215 09:50:38.756226       1 controllermanager.go:244] Skipping "route"
I0215 09:50:38.756587       1 controller.go:208] Starting service controller
I0215 09:50:38.756608       1 shared_informer.go:223] Waiting for caches to sync for service
I0215 09:50:38.757025       1 node_controller.go:110] Sending events to api server.
I0215 09:50:38.758221       1 controllermanager.go:247] Started "cloud-node"
I0215 09:50:38.854285       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0215 09:50:38.854285       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0215 09:50:38.856725       1 shared_informer.go:230] Caches are synced for service
E0215 09:50:38.882138       1 node_controller.go:237] hcloud/instances.InstanceExistsByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: 10037937

Same issue. Are you able to test using Calico CNI?

Thank you

mhutter commented 3 years ago

Not sure if related or not, I had to remove the quotes (") from --kubelet-arg because they were added verbatim.

You can verify it worked as expected when your initial node has a Taint:

$ kubectl describe node | grep Taints
Taints:             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

If there is no such taint, the cloud-provider config was not applied correctly.

kladiv commented 3 years ago

Hi, this is a test without quotes:

root@infra1:~# export INSTALL_K3S_VERSION=v1.20.0+k3s2
root@infra1:~# export K3S_TOKEN="k3stest"
root@infra1:~# curl -sfL https://get.k3s.io | sh -s - --disable servicelb --disable traefik --disable-cloud-controller --kubelet-arg cloud-provider=external --no-flannel
[INFO]  Using v1.20.0+k3s2 as release
[INFO]  Downloading hash https://github.com/rancher/k3s/releases/download/v1.20.0+k3s2/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/rancher/k3s/releases/download/v1.20.0+k3s2/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s

root@infra1:~# kubectl describe node | grep Taints
Taints:             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

root@infra1:~# kubectl taint nodes infra1 node.cloudprovider.kubernetes.io/uninitialized:NoSchedule-
node/infra1 untainted

root@infra1:~# kubectl describe node | grep Taints
Taints:             <none>

root@infra1:~# kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
poddisruptionbudget.policy/calico-kube-controllers created
root@infra1:~#

root@infra1:~# kubectl -n kube-system create secret generic hcloud --from-literal=token=Dwwm8F1T51kQYTHuANEK4vwGpI8g9Hvwu2cIP5uSjKAIGfsz8eOayrN5wwjEaTui
secret/hcloud created

root@infra1:~# kubectl apply -f  https://raw.githubusercontent.com/hetznercloud/hcloud-cloud-controller-manager/master/deploy/ccm.yaml
serviceaccount/cloud-controller-manager created
clusterrolebinding.rbac.authorization.k8s.io/system:cloud-controller-manager created
deployment.apps/hcloud-cloud-controller-manager created

root@infra1:~# kubectl -n kube-system logs -f hcloud-cloud-controller-manager-7cf4f5974-4vl9q
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0218 22:27:10.666493       1 serving.go:313] Generated self-signed cert in-memory
W0218 22:27:11.203808       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0218 22:27:11.207171       1 controllermanager.go:120] Version: v0.0.0-master+$Format:%h$
I0218 22:27:11.207231       1 cloud.go:90] hcloud/newCloud: HCLOUD_NETWORK empty
W0218 22:27:11.547879       1 controllermanager.go:132] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0218 22:27:11.549913       1 secure_serving.go:178] Serving securely on [::]:10258
I0218 22:27:11.551418       1 node_controller.go:110] Sending events to api server.
I0218 22:27:11.551494       1 controllermanager.go:247] Started "cloud-node"
I0218 22:27:11.552795       1 node_lifecycle_controller.go:78] Sending events to api server
I0218 22:27:11.552820       1 controllermanager.go:247] Started "cloud-node-lifecycle"
I0218 22:27:11.554367       1 controllermanager.go:247] Started "service"
I0218 22:27:11.554380       1 core.go:101] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true.
W0218 22:27:11.554389       1 controllermanager.go:244] Skipping "route"
I0218 22:27:11.555472       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0218 22:27:11.555496       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0218 22:27:11.555532       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0218 22:27:11.555645       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0218 22:27:11.555651       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0218 22:27:11.556711       1 controller.go:208] Starting service controller
I0218 22:27:11.556720       1 shared_informer.go:223] Waiting for caches to sync for service
Hetzner Cloud k8s cloud controller v1.8.1 started
I0218 22:27:11.655703       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0218 22:27:11.655856       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0218 22:27:11.658041       1 shared_informer.go:230] Caches are synced for service
E0218 22:27:11.759258       1 node_controller.go:237] hcloud/instances.InstanceExistsByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: 10037937

Same behaviour, it seems not work.

Atem18 commented 3 years ago

I just tested a fresh install on Ubuntu 20.04 with three nodes and it works:

# Install K3S
export K3S_TOKEN=my_k3s_token
curl -sfL https://get.k3s.io | sh -s - --cluster-init --disable servicelb --disable traefik --disable local-storage --disable metrics-server --disable-cloud-controller --kubelet-arg="cloud-provider=external"
curl -sfL https://get.k3s.io | sh -s - --server https://10.0.0.2:6443 --disable servicelb --disable traefik --disable local-storage --disable metrics-server --disable-cloud-controller --kubelet-arg="cloud-provider=external"

# Copy kuctl config
kubectl config view --raw >~/.kube/config

# Install Hcloud cloud controller
kubectl -n kube-system create secret generic hcloud --from-literal=token=my_hcloud_token
kubectl apply -f https://raw.githubusercontent.com/hetznercloud/hcloud-cloud-controller-manager/master/deploy/ccm.yaml
kladiv commented 3 years ago

Hi @Atem18 , are you able to check your env with k3s + Calico?

Thank you

Atem18 commented 3 years ago

@kladiv I am not sure how to switch and rollback CNI, I am fairly new to K8S world. But if you know how to proceed, please let me know. :)

kladiv commented 3 years ago

Hi @Atem18 , you can follow my commands. Btw in your previous test, did you check "hcloud-cloud-controller-manager" pods logs? No missing prefix issue?

Atem18 commented 3 years ago

@kladiv Yes, but I must know how to rollback so I don't have to reinstall the cluster.

And no I have no missing prefix issue, only an error with ingress but that's probably on my side.

kladiv commented 3 years ago

Hi, i tried your commands with a new Ubuntu 20.04 CX21 cloud server (single node), but i got the same issues. I only add the untaint commnd to allow container creation:

kubectl taint nodes <node> node.cloudprovider.kubernetes.io/uninitialized:NoSchedule-

I've no idea why it works for you. Any label or settings on VM side in Hetzner panel?

Atem18 commented 3 years ago

@kladiv Pay attention, the command

curl -sfL https://get.k3s.io | sh -s - --cluster-init --disable servicelb --disable traefik --disable local-storage --disable metrics-server --disable-cloud-controller --kubelet-arg="cloud-provider=external"

is executed on the first master but the command

curl -sfL https://get.k3s.io | sh -s - --server https://10.0.0.2:6443 --disable servicelb --disable traefik --disable local-storage --disable metrics-server --disable-cloud-controller --kubelet-arg="cloud-provider=external"

is executed on the two other nodes.

And I did not had to change any taint to run my containers nor change any label or settings.

Just the commands I gave you, then deploy my apps with Helm.

kladiv commented 3 years ago

Hi, i found the issue. The untaint command i invoke make some containers (e.g. CoreDNS) start before hcloud-cloud-controller-manager. Probably this cause an unexpected scenario and break cloud controller setup.