kubernetes-sigs / cloud-provider-kind

Cloud provider for KIND clusters
Apache License 2.0
162 stars 36 forks source link

Running cloud-provider-kind 0.3.0 as container fails to probe ccm healthcheck #109

Closed rophy closed 1 month ago

rophy commented 2 months ago

Env Info

default:~/containers/cloud-provider-kind$ kind version
kind v0.14.0 go1.18.2 linux/amd64

default:~/containers/cloud-provider-kind$ docker version
Client: Docker Engine - Community
 Version:           27.0.3
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        7d4bcd8
 Built:             Sat Jun 29 00:02:33 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.0.3
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       662f78c
  Built:            Sat Jun 29 00:02:33 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.19
  GitCommit:        2bf793ef6dc9a18e00cb12efb64355c2c9d5eb41
 runc:
  Version:          1.7.19
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0.

Steps to reproduce

  1. kind create cluster
  2. docker run --rm --network kind -v /var/run/docker.sock:/var/run/docker.sock cloud-provider-kind
  3. Create a LoadBalancer service following README

Expected Result

cloud-provider-kind should be able to:

  1. Detect the creation of a LB service
  2. Create a ccm container
  3. Probe its readiness as success

Actual Result

cloud-provider-kind can do step 1, 2, and fails on step 3

I0803 02:16:57.571892       1 instances.go:47] Check instance metadata for kind-control-plane
I0803 02:16:57.620937       1 instances.go:75] instance metadata for kind-control-plane: &cloudprovider.InstanceMetadata{ProviderID:"kind://kind/kind/kind-control-plane", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"kind-control-plane"}, v1.NodeAddress{Type:"InternalIP", Address:"172.18.0.2"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::2"}}, Zone:"", Region:"", AdditionalLabels:map[string]string(nil)}
I0803 02:16:57.625728       1 node_controller.go:267] Update 1 nodes status took 53.884679ms.
I0803 02:16:57.791201       1 proxy.go:332] unexpected error trying to get load balancer kindccm-JQR2TXKKQBRJNYZ4LUJ4TY4C2YAW4FANP7ZCY5PN readyness :Get "http://127.0.0.1:32768/ready": dial tcp 127.0.0.1:32768: connect: connection refused
...
I0803 02:17:07.791081       1 proxy.go:332] unexpected error trying to get load balancer kindccm-JQR2TXKKQBRJNYZ4LUJ4TY4C2YAW4FANP7ZCY5PN readyness :Get "http://127.0.0.1:32768/ready": dial tcp 127.0.0.1:32768: connect: connection refused
E0803 02:17:07.791164       1 controller.go:298] error processing service default/lb-service-local (retrying with exponential backoff): failed to ensure load balancer: context deadline exceeded
I0803 02:17:07.791363       1 event.go:389] "Event occurred" object="default/lb-service-local" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: context deadline exceeded"
rophy commented 2 months ago

I noticed it is probing 127.0.0.1, which looks wrong. It is supposed to be aware that it's running inside container, and probe the IP address of ccm.

aojea commented 2 months ago

/kind bug /kind documentation

It seems that we forget about this installation mode when we migrated to envoy as the loadbalancer and added the healthchecks to improve reliability, thanks for reporting, the healtcheck indeed uses the localhost because the envoy admin port is. forwarded, so cloud-provider-kind need to run on the host network

Can you try docker run --rm ---network host -v /var/run/docker.sock:/var/run/docker.sock cloud-provider-kind?

That should make it

rophy commented 2 months ago

Running in host mode works fine, thanks.

What are the considerations for choosing the network mode? I imagine there are some assumptions for host mode to work, e.g. the target kind container must have exposed a host port for cloud-provider-kind to reach.

aojea commented 2 months ago

The need to support multiple platforms like windows, Mac or WSL or other containerized combos breaks the assumption the cloud-provider-kind can reach the container, so we rely on the docker portmap API to forward those ports to localhost to support all platform ... 🤷

/help

PR are welcome to fix the docs to use host network when running inside a container