raesene / kube_security_lab

227 stars 41 forks source link

All vulnerable clusters fail on TASK [Start a kind cluster] #15

Open caday00 opened 2 years ago

caday00 commented 2 years ago

@raesene I'm having problems with all of the vulnerable clusters starting. The client-machine works fine, but when I try to start any vulnerable cluster I get the error message below:

Occours on both Ubuntu and Kali instances

xxxx@xxxx-virtual-machine:~/Downloads/kube_security_lab$ ansible-playbook etcd-noauth.yml [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [Start up a kind cluster] ***

TASK [Gathering Facts] *** ok: [localhost]

TASK [Start a kind cluster] ** fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["kind", "create", "cluster", "--image=raesene/customkind:v1.16.9", "--name", "etcdnoauth", "--config", "kubeadm_configs/etcd-noauth.yml"], "delta": "0:02:11.844904", "end": "2022-05-21 10:32:19.774774", "msg": "non-zero return code", "rc": 1, "start": "2022-05-21 10:30:07.929870", "stderr": "Creating cluster \"etcdnoauth\" ...\n • Ensuring node image (raesene/customkind:v1.16.9) 🖼 ...\n ✓ Ensuring node image (raesene/customkind:v1.16.9) 🖼\n • Preparing nodes 📦 ...\n ✓ Preparing nodes 📦 \n • Writing configuration 📜 ...\n ✓ Writing configuration 📜\n • Starting control-plane 🕹️ ...\n ✗ Starting control-plane 🕹️\nERROR: failed to create cluster: failed to init node with kubeadm: command \"docker exec --privileged etcdnoauth-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6\" failed with error: exit status 1\n\nCommand Output: I0521 14:30:11.472943 62 initconfiguration.go:190] loading configuration from \"/kind/kubeadm.conf\"\n[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta2, Kind=JoinConfiguration\nI0521 14:30:11.476506 62 feature_gate.go:216] feature gat [...snipped...] "\t/usr/local/go/src/runtime/proc.go:203", "runtime.goexit", "\t/usr/local/go/src/runtime/asm_amd64.s:1357"], "stdout": "", "stdout_lines": []}

PLAY RECAP *** localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

OR like below:

xxxx@xxxx-virtual-machine:~/Downloads/kube_security_lab$ansible-playbook insecure-port.yml [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [Start up a kind cluster] ***

TASK [Gathering Facts] *** ok: [localhost]

TASK [Start a kind cluster] ** fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["kind", "create", "cluster", "--image=raesene/customkind:v1.18.2", "--name", "insecureport", "--config", "kubeadm_configs/insecureport.yml"], "delta": "0:00:00.029422", "end": "2022-05-21 10:26:05.910223", "msg": "non-zero return code", "rc": 1, "start": "2022-05-21 10:26:05.880801", "stderr": "ERROR: node(s) already exist for a cluster with the name \"insecureport\"", "stderr_lines": ["ERROR: node(s) already exist for a cluster with the name \"insecureport\""], "stdout": "", "stdout_lines": []}

PLAY RECAP *** localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

caday00 commented 2 years ago

xxxx@xxxx-virtual-machine:~/Downloads/kube_security_lab$ kind --version kind version 0.8.1

xxxx@xxxx-virtual-machine:~/Downloads/kube_security_lab$ docker --version Docker version 20.10.16, build aa7e414

mrintern commented 1 year ago

Hi @caday00 , i was actually about to open up an issue for this same error. The first thing i did was install kubeadm, as i hadn't had it installed before. Unfortunately ansible-playbook insecure-port.yml still fails with the same error. Now i'm trying to run the failing command independently of the ansible playbook:

Failing command:

@raesene Any suggestions for debugging this issue? I'd be happy to open a PR once we figure out the solution.

mrintern commented 1 year ago

I'm running the ubuntu app on windows 11 and i've installed everything from the ubuntu shell (ansible, kind, docker, kubectl + kubeadm + kubelet)

mrintern commented 1 year ago

Running the failing "kind create cluster..." command alone produces the same error, but with a formatted error message image

raesene commented 1 year ago

hi all, sorry I missed this issue when it first opened, thanks @mrintern for the ping :)

So at the moment ansible-playbook insecure-port.yml and ansible-playbook etcd-noauth.yml are working for me, so I'm guessing there's got to be a difference somewhere in environments or versions that's causing it to fail for you.

If you can give me an idea of the output of kind version and ansible --version that might be a good place to start.

Also just to check, does an ordinary kind create cluster work ok? That's a good check to see if it's something in the security lab or a more general problem with KinD's setup.

mrintern commented 1 year ago

Sorry for the delay and thanks for the reply @raesene

image

raesene commented 1 year ago

ok so I've had a look into this and I think I can see what the problem is. Basically the KinD node images that we use are customized, and with the newer versions of KinD it looks like we need to rebuild our images due to breaking changes.

Unfortunately at the moment I can't easily re-build the images as there's an issue where updating KinD images will fail as they're based on ubuntu 21.10 which is EOL.

As a workaround you could downgrade to KinD v0.11 which should work ok and I'll be re-building the images to make them work with newer KinD versions as soon as their node images are re-built (or I work out another way to create node images :) )

mrintern commented 1 year ago

Thanks Rory, I’m gonna downgrade kind and check back either Monday or Tuesday.

I am making a YouTube series on k8s pentesting basics and this project covers all of the things I wanted to + more so I’m really looking forward to getting the most out of it

On Sun, Aug 7, 2022 at 3:36 PM Rory McCune @.***> wrote:

ok so I've had a look into this and I think I can see what the problem is. Basically the KinD node images that we use are customized, and with the newer versions of KinD it looks like we need to rebuild our images due to breaking changes.

Unfortunately at the moment I can't easily re-build the images as there's an issue https://github.com/kubernetes-sigs/kind/issues/2863 where updating KinD images will fail as they're based on ubuntu 21.10 which is EOL.

As a workaround you could downgrade to KinD v0.11 https://github.com/kubernetes-sigs/kind/releases/tag/v0.11.0 which should work ok and I'll be re-building the images to make them work with newer KinD versions as soon as their node images are re-built (or I work out another way to create node images :) )

— Reply to this email directly, view it on GitHub https://github.com/raesene/kube_security_lab/issues/15#issuecomment-1207422408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF2TYNCTYSFZH4OP7NRBSHDVX7CXVANCNFSM5WR5LE4Q . You are receiving this because you were mentioned.Message ID: @.***>

--

mrintern commented 1 year ago

@raesene downgrading to kind v0.11.0 fixed it for me, thanks. For others who are having this issue, here is how to install kind 0.11.0 on linux (or the ubuntu app in windows, in my case).

INSTALL kind v0.11.0

if you would like another contributor to the project, i'd be happy to join and edit the README to reflect this hard dependency

raesene commented 1 year ago

@mrintern Cool, glad that worked! yeah if you want to do a PR for the README, that'd be cool. I'm hopeful the images will get fixed in KinD reasonably soon (they've got a fixed Kubernetes 1.24 image but we need older versions) but it would be good to let people know how to bypass the problem in the meantime.

mrintern commented 1 year ago

@raesene awesome, I'm currently awaiting commit permissions. BTW i recorded a video of the full install process today for my youtube series (which will be using this project), so i will definitely link that in the README to help people along once I post it.

It follows an install of all the dependencies (docker, ansible, kind v0.11.0, and kubectl) on Windows 11 using WSL (windows subsystem for linux). I made a video as there are a few "Gotchas" that require some hand holding.

raesene commented 1 year ago

Cool thanks for the updated docs, and extra tutorials will be great.

I'll leave this issue open till KinD have all the new images out so we can track it.

raesene commented 1 year ago

https://github.com/kubernetes-sigs/kind/releases/tag/v0.15.0#contributors <-- new release is out. I'll try rebuilding the custom-kind images to work with this then we can test.

iknowjason commented 11 months ago

HI @raesene I seem to be having same issue as described above. Failing with TASK [Start a kind cluster]. I'll post the error below. I'm running kind v0.11.0. Anything else I can check on this? Does version of python (3.10), docker client (24.0.1), or ansible (2.10.8) have anything to do with this? Here is start of the output on running the etcd-noauth.yml playbook:

ASK [Start a kind cluster] *********************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["kind", "create", "cluster", "--image=raesene/customkind:v1.16.9", "--name", "etcdnoauth", "--config", "kubeadm_configs/etcd-noauth.yml"], "delta": "0:02:04.243326", "end": "2023-08-10 17:25:08.560106", "msg": "non-zero return code", "rc": 1, "start": "2023-08-10 17:23:04.316780", "stderr": "Creating cluster \"etcdnoauth\" ...\n • Ensuring node image (raesene/customkind:v1.16.9) 🖼  ...\n ✓ Ensuring node image (raesene/customkind:v1.16.9) 🖼\n • Preparing nodes 📦   ...\n ✓ Preparing nodes 📦 \n • Writing configuration 📜  ...\n ✓ Writing configuration 📜\n • Starting control-plane 🕹️  ...\n ✗ Starting control-plane 🕹️\nERROR: failed to create cluster: failed to init node with kubeadm: command \"docker exec --privileged etcdnoauth-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6\" failed with error: exit status 1\n\nCommand Output: I0810 22:23:09.806864      61 initconfiguration.go:190] loading configuration from \"/kind/kubeadm.conf\"\n[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta2, Kind=JoinConfiguration\nI0810 22:23:09.815918   
iknowjason commented 11 months ago

@raesene Also, when I try to just run kind like this to create the cluster, I get this error below. I'm not sure what the deal is but please let me know if I can get you anything else for troubleshooting. Looks like an excellent lab:

kind create cluster --image=raesene/customkind:v1.16.9 --name test --config kubeadm_configs/etcd-noauth.yml
Creating cluster "test" ...
 ✓ Ensuring node image (raesene/customkind:v1.16.9) 🖼
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✗ Starting control-plane 🕹️ 
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged test-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Command Output: I0810 22:28:46.849038      61 initconfiguration.go:190] loading configuration from "/kind/kubeadm.conf"
[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta2, Kind=JoinConfiguration
I0810 22:28:46.854592      61 feature_gate.go:216] feature gates: &{map[]}
[init] Using Kubernetes version: v1.16.9
I0810 22:28:46.854947      61 kubelet.go:61] Stopping the kubelet
[kubelet-start] WARNING: unable to stop the kubelet service momentarily: [exit status 1]
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
I0810 22:28:46.869813      61 kubelet.go:79] Starting the kubelet
[kubelet-start] Activating the kubelet service
[kubelet-start] WARNING: unable to start the kubelet service: [failed to reload systemd: exit status 1]
[kubelet-start] Please ensure kubelet is reloaded and running manually.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
raesene commented 10 months ago

@iknowjason Sorry for the slow reply! So yeah the image that's being used for the etcd-noauth scenario is fairly old and it doesn't seem like there's a version which a) has the vuln and b) works with the current Kind.

What I've done is change the ansible playbook for that scenario to use an old version of kind, which is in repo (it's kind 0.8). Now that should work ok if you're using an linux AMD64 based system, but it'll likely fail if you're on a mac or windows.

what I'll need to do, to make this a bit more robust, is get old kind binaries for those other architectures and then detect the arch when running that playbook...

iknowjason commented 10 months ago

Hi @raesene sorry for the delayed response. I updated the repo and still getting the same error. I'm running it like this: ansible-playbook etcd-noauth.yml. Not sure what the deal is but I can keep digging.