tinkerbell / playground

Example deployments of the Tinkerbell Stack for use as playground environments
Apache License 2.0
127 stars 85 forks source link

Terraform love #126

Closed mmlb closed 2 years ago

mmlb commented 2 years ago

Description

All sorts of things mainly aimed at terraform setup. Quick one-liners for each change follow, more info/context usually found in the commit message.

direnv: use has from stdlib

Simplification of .envrc by making use of stdlib.

git: Manage git ignores in just one .gitignore file

I was originally updating .gitignore in the terraform folder, but the root file had relevant entries so I decided to just use one.

tf: Get rid of mention of ewr1 in comment

This bit of comment was unhelpful and actually very wrong.

tf: Add output value for the provisioner ssh hostname

Makes for easier ssh'ing to the provisioner, can do ssh $(tf output -raw provisioner_ssh).

tf/setup: Configure bash to be stricter/safer

Speaks for itself I think.

tf/setup: Ensure all functions use same execution mode

Some functions were running in a subshell, this is not necessary and relatively unknown.

tf/setup: Use apt-get helper function

Pulling common apt-get args into a function instead of copy/pasting in each call.

tf/setup: Only explicitly install the docker packages

Why are we manuall installing deps when apt is better at it than humans?

tf/setup: Don't hard code the arch when adding docker apt-repository

Pretty self explanatory, bit me when I tried a aarch64 machine.

tf/setup: Install docker-compose using pip

Also ran into no file (binary ?) for non x86_64 machines in GitHub releases.

tf/setup: Make main function actually functional

main (as a func) wasn't really doing anything useful before, now the file can be sourced vs executed.

tf/setup: Do not restart docker service

No need to restart docker, just because we installed docker-compose.

tf/setup: Persist 2 separate network config

This way machine stays useful after reboots.

tf/setup: Improve correctness of get_second_interface_from_bond0

Mostly a nit/theoretical fix, but nice to have imo.

tf/setup: Persist iptables gw rules

This way machine stays useful after reboots.

tf: Add local variable for worker_macs

So lines that need the mac read better.

tf: Add output for worker_macs

This way we can get to watching boots logs more quickly.

tf: Add outputs for provisioner and worker ids

For look up in EMAPI/Portal.

tf: Modify compose/.env file for repeat docker-compose runs

Before this docker-compose would do the wrong thing if ran manually because the env vars were not around.

tf: Put all setup logic in setup.sh

I originally did this because there was a race between userdata and tf remote-exec, but that is no longer true. This is good to have anyway so that we only need to look at one file to grok the setup/run process instead of two.

tf: Add some interactive user goodies

Similar to the ones for vagrant, makes for nicer interactive use.

tf: Use format instead of formatlist for worker_sos output

This way we can ssh into sos more easily.

vagrant: Move all provisioner code into just one script

One shell script is easier to read than a bunch of shell blocks in a ruby file imo.

vagrant: Install docker and docker-compose via setup.sh

Running vagrant w/o these plugins causes vagrant to fetch them and exit, which means we need to bring vagrant up again. This is pretty poor experience imo, by installing in setup.sh we don't need anything from the host os.

deploy: Use the same folder path on both terraform and vagrant

Who cares that we're running in tf vs vagrant anyway, this way mental models/paths are valid for both. This also lets the setup.sh scripts have more things in common that are semantically common, which hopefully means they'll be easier to keep in sync.

Why is this needed

I tried running sandbox on some arm machines and ran into all sorts of hard coded assumptions and race conditions that don't seem to hit in x86 land. This got me to spend some time running lots of tf setups and hitting the a few bugs. Having to cd and run full command names also got boring.

How Has This Been Tested?

Lots of terraform apply.

How are existing users impacted? What migration steps/scripts do we need?

More reliable/generic tf setup.

Checklist:

I have:

displague commented 2 years ago

Please hold on to this until #96.

mmlb commented 2 years ago

Please hold on to this until #96.

Sounds good, will do. I noticed you add more to the cloud-init/userdata where as I got rid of it completely. I did that because it was easier to see what was going on while using the remote-exec provisioner. I think its good to see this considering that this repo is meant as a demonstration of setup. its also easier to debug if something does end up going wrong. How do you feel about what I did @displague?

displague commented 2 years ago

@mmlb I haven't reviewed your PR yet. We could have an SSH provisioner wait for the end of the userdata by watching the cloud init logs for the end of scripting, reporting output over SSH while it waits.

Perhaps the Terraform module and the sandbox Terraform configuration should be refactored some more. Sandbox would call upon the module (we would move this elsewhere) with arguments.

mmlb commented 2 years ago

@mmlb I haven't reviewed your PR yet. We could have an SSH provisioner wait for the end of the userdata by watching the cloud init logs for the end of scripting, reporting output over SSH while it waits.

I had this initially but then after a while I realized that there's not really a good reason to do both userdata and a remote-exec, that it made most sense to have the execution in just one script. I moved everything over to remote-exec as it was a better experience for me.

Perhaps the Terraform module and the sandbox Terraform configuration should be refactored some more. Sandbox would call upon the module (we would move this elsewhere) with arguments.

That feels like a separate thing altogether. Sandbox and the terraform setup is pretty opionoated and relatively simple, makes sense for dev/messing around. Having a module seems like it'll increase the complexity a bunch, worth it for more serious cases but too much for sandbox to me.

mmlb commented 2 years ago

@jacobweinstock would love your :eye: here too

displague commented 2 years ago

provisioned successfully - looks good. I like the simplifications.

Perhaps in the future, we can go all-in on cloud-config format rather than using shell scripts. The devils in the details though.