kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.02k stars 1.51k forks source link

Unable to connect to kind API server when using Kind inside Kubernetes #3622

Closed jayesh-srivastava closed 3 weeks ago

jayesh-srivastava commented 1 month ago

What happened: I am creating a test setup similar to Cluster API and Providers repos where I want to run e2e cluster jobs inside test-pods using prow jobs. The test initiates by creating a kind management cluster. I am able to create a kind cluster but I am not unable to connect to this kind cluster's api server. I get a The connection to the server 127.0.0.1:43357 was refused - did you specify the right host or port? error. I have gone through #303 and mounted these paths and changed the dnsPolicy to Default but the error still persists.

 - mountPath: /lib/modules
    name: modules
    readOnly: true
 - mountPath: /sys/fs/cgroup
    name: cgroup
 - name: docker-root
    mountPath: /var/lib/docker

What you expected to happen:I expected to be able to connect to the kind cluster.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Server: Containers: 3 Running: 3 Paused: 0 Stopped: 0 Images: 3 Server Version: 23.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: 4e1fe7492b9df85914c389d1f15a3ceedbb280ac runc version: a916309fff0f838eb94e928713dbc3c0d0ac7aa4 init version: fec3683b971d9c3ef73f284f176672c44b448662 Security Options: apparmor seccomp Profile: builtin cgroupns Kernel Version: 5.15.146+ Operating System: Container-Optimized OS from Google OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.36GiB Name: prow-abcd ID: 8c0d8a83-b2df-4429-8ab3-7bed1934ae0b Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 10.0.0.0/8 127.0.0.0/8 Registry Mirrors: https://mirror.gcr.io/ Live Restore Enabled: true


- OS (e.g. from `/etc/os-release`): `alpine`
- Kubernetes version: (use `kubectl version`):  `v1.30.0`
- Any proxies or other special environment settings?:
BenTheElder commented 1 month ago

First of all: I do NOT recommend using Kubernetes inside Kubernetes (kind or otherwise) as there's a lot of confusing behavior when attempting to nest them, that said Kubernetes does for ... reasons.

I get a The connection to the server 127.0.0.1:43357 was refused - did you specify the right host or port?

This is not nearly enough information to debug.

You might not be able to connect because the client isn't on the same network (note that 127.0.0.1 is local to a network namespace, so wherever the container runtime is that the kind nodes are running on), or the cluster might not be up or ...

jayesh-srivastava commented 1 month ago

@BenTheElder I understand the network disparity here. I was also looking at #523 where a comment is provided(https://github.com/kubernetes-sigs/kind/issues/523#issuecomment-491849857). According to this looks like an IP forwarding is required. But providing a config to kind and then forwarding the IP, seems like manual steps. Is there anyway where a manual intervention is not required. Just like Cluster API and other providers repos, where they perform e2e tests using kind in Kubernetes way, I have the same use-case.

BenTheElder commented 1 month ago

According to this looks like an IP forwarding is required. But providing a config to kind and then forwarding the IP, seems like manual steps. Is there anyway where a manual intervention is not required. Just like Cluster API and other providers repos, where they perform e2e tests using kind in Kubernetes way, I have the same use-case.

There's no additional forwarding because they're running any other steps in the same network namespace as the node containers. I.E. the container running dind is also running cluster-API.

jayesh-srivastava commented 1 month ago

Hi @BenTheElder , I was just playing around and trying to get a workaround. So the nested docker container(kind's control plane), I could exec into it and get the /etc/kubernetes/admin.conf, but, I want to access it outside of that nested docker container, in the host container. I am not able to figure out this thing. Is there a way I can do that? Just to again clarify, this host container along with some other container are a part of a GKE cluster's pod.

And I was getting this error when trying to access the nested container(kind control-plane) from the host container:

E0522 19:50:56.903340   17267 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:34335/api?timeout=32s": dial tcp 127.0.0.1:34335: connect: connection refused
E0522 19:50:56.904786   17267 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:34335/api?timeout=32s": dial tcp 127.0.0.1:34335: connect: connection refused
E0522 19:50:56.905070   17267 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:34335/api?timeout=32s": dial tcp 127.0.0.1:34335: connect: connection refused
E0522 19:50:56.906505   17267 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:34335/api?timeout=32s": dial tcp 127.0.0.1:34335: connect: connection refused
The connection to the server 127.0.0.1:34335 was refused - did you specify the right host or port?
BenTheElder commented 1 month ago

So the nested docker container(kind's control plane), I could exec into it and get the /etc/kubernetes/admin.conf, but, I want to access it outside of that nested docker container, in the host container

Wherever you ran kind create cluster, should be able to access it, unless it's using a remote docker daemon or something.

In cluster API's CI, it looks like:

host => container running dind + kind + cluster API CI scripts => kind nodes => kind pods

That "just works" with the normally exported KUBECONFIG from kind create cluster because the cluster API CI steps are running in the same container as dind / the nodes / ...

Just to again clarify, this host container along with some other container are a part of a GKE cluster's pod. And I was getting this error when trying to access the nested container(kind control-plane) from the host container:

I can't quite tell but it sounds like your layout is more like: host => container running dind / kind => kind nodes => kind pods host => CI steps

It's a LOT more complicated and really out of scope for us / not recommended ... you will have to either operate something like an SSH tunnel or configure the kind cluster to expose to something other than localhost (which we don't recommend for security purposes), the localhost addresses are not going to be accessible between different pods / containers.

https://kind.sigs.k8s.io/docs/user/configuration/#api-server

jayesh-srivastava commented 1 month ago

@BenTheElder So my layout looks like: test-pod => container running kind + CI scripts after exporting kind kubeconfig

Wherever you ran kind create cluster, should be able to access it, unless it's using a remote docker daemon or something.

Yes this is what still bothers me.

or configure the kind cluster to expose to something other than localhost (which we don't recommend for security purposes), the localhost addresses are not going to be accessible between different pods / containers.

Right, I get the security aspect of this.

But my kubectl just denies connecting to the local address which I get in my kubeconfig. I have tried providing a config with kind too with apiServerAddress: 0.0.0.0 but that also doesn't work.

BenTheElder commented 1 month ago

test-pod => container running kind + CI scripts after exporting kind kubeconfig

To be clear, kind export kubeconfig? or the exported config from kind create cluster?

because kind export kubeconfig is meant to be local to where docker is running, it has no idea about where a remote instance might be.

If it's from kind create cluster, running in the same dind container, and you can't access it, something is broken with the networking in this environment and you'll have to debug that. You could do a more minimal test without kind by just running any container with a networked service and a docker port forward and getting that part to work in your dind environment.

jayesh-srivastava commented 1 month ago

@BenTheElder I mean kind create cluster .

BenTheElder commented 1 month ago

I would start debugging from just a container with a minimal docker port forward to hello-world and see what it takes to get that working in the dind environment.