siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.39k stars 514 forks source link

talosctl: installing rook on docker provisioned cluster corrupts host's LUKS partition #5519

Open sauterp opened 2 years ago

sauterp commented 2 years ago

Bug Report

Description

  1. I provisioned a talos cluster with docker on Fedora 35: talosctl cluster create --wait --extra-disks 1 --workers 3
  2. I followed this guide and installed Rook.

After I rebooted my machine it didn't boot anymore. All my partitions were intact except the LUKS partition, which was reformatted as a cephBluestore. I didn't reproduce the issue since it would require going through the whole setup of my machine again. It's possible that I did something else that caused the problem.

smira commented 2 years ago

The root cause is that talosctl cluster create does the equivalent of docker run --privileged, and that exposes host block devices to the container, which in turn exposes them to pods running on Kubernetes in Talos inside the container. So Rook can detect and mistakenly try to use a host block device.

This feels like a bug to me, and we should fix it. The problem is that I don't see equivalent of --privileged via other options in the Docker API which would allow us to disable device passthrough.

sanmai-NL commented 4 months ago

@smira Why wrap the Docker CLI tool in the first place? It's opaque and pretty dangerous, as it turns out here. Someone used to running Linux containers shouldn't be discouraged by having to issue a lengthy command line. In fact, they could use the compose (Docker Engine, Podman, nerdctl) and/or kube play (Podman) subcommands and you could define a sample spec in YAML in the docs, to keep it brief.

sanmai-NL commented 4 months ago

@smira

This feels like a bug to me, and we should fix it. The problem is that I don't see equivalent of --privileged via other options in the Docker API which would allow us to disable device passthrough.

Do I understand correctly that you want all of --privileged, but disable having host /dev/ or more specifically the block devices mounted inside the container? Would https://docs.docker.com/reference/cli/docker/container/run/#device-cgroup-rule help you restrict that?

But see also: https://github.com/siderolabs/talos/issues/4385#issuecomment-2058841449