tinkerbell / playground

Example deployments of the Tinkerbell Stack for use as playground environments
Apache License 2.0
126 stars 85 forks source link

Suggestions for improving docker-compose quickstart guide #172

Closed douglaswainer closed 11 months ago

douglaswainer commented 1 year ago

I would like to suggest some improvements to the docker-compose quickstart for an easier experience for beginner home lab enthusiasts. I will follow this with a PR of suggested changes :)

Expected Behaviour

A foolish newbie such as myself should be able to easily set up Tinkerbell with the docker-compose quickstart, on their bare metal machines whilst being explicitly guided on avoiding common pitfalls.

Current Behaviour

1) The docker-compose guide currently doesn't work for bare metal devices with nvme guides unless you modify the hardware.yaml, which I think is something that will put off complete beginners. There is a DISK_DEVICE variable in the .env file, but changing this to /dev/nvme0n1 will still fail on workflows that specify the first partition of the disk because of nvme naming conventions. E.g:

BLOCK_DEVICE: {{ index .Hardware.Disks 0 }}1

Will become nvme0n11, instead of nvme0n1p1.

2) It's not made clear that for the quickstart guide to work, both the provisioner and the worker nodes require outbound internet access. This might be obvious to some folks, but I personally was a little overenthusiastic about locking down my PXE network and decided only the provisioner needed outbound access! This took a while to figure out.

3) In the hardware.yaml, there is a $TINKERBELL_CLIENT_GW variable that isn't present in the .env file or the quickstart guide. Additionally the DNS servers in the hardware manifest are hard coded, and in my case I had setup up my pfsense gateway device to only allow DNS traffic to itself, instead of these common name servers.

4) The Kubenetes namespace that gets set up in the docker-compose quickstart is named default instead of tink-system (you can see in the ./state/kube/kubconfig.yaml).

Possible Solution

2 and 4 are easily addressed with small changes.

1) I'm still new to Tinkerbell, but it doesn't look like there's currently a way in the manifest to specify the device for the first partition of the first disk e.g. something like:

{{ index .Hardware.Disks 0 Partition 0 }}

That would substitute automatically to /dev/nvme0n1p1. If this were possible to set up, I think this is the right way to go about this. Assuming there's no better way, I suggest creating a new substitution variable that can be set in the .env file DISK_DEVICE_PARTITION_1, with examples for nvme.

3) Add new substitution variables for the clients nameservers e.g.

TINKERBELL_CLIENT_NAMESERVER_1
TINKERBELL_CLIENT_NAMESERVER_2

Then add these and the TINKERBELL_CLIENT_GW variable to the .env file.

In the documentation, instead of setting these as bash variables the user should be encouraged to look and modify the .env file and to run docker-compose with that.

Finally we should encourage beginners to look and follow the docker-compose logs on the provisioner, as they'll be able to troubleshoot the earlier stages of provisioning (DHCP/PXE boot), instead of just the workflow state.

Steps to Reproduce (for bugs)

Follow through the docker-compose quickstart with bare metal devices with nvme disks.

Context

I'm starting my journey of building a "simple" bare metal provisioning platform for my cluster of lenovo thinkcenter devices. This is the second time I've tried Tinkerbell. My first attempt I hit these pitfalls and I just gave up.

Your Environment

Devices: Lenovo Thinkcenter Tiny M900 machines Provisioner OS: Ubuntu server 22.04 LTS Provisioner installed software: docker 20.10.21 docker-compose 1.29.2