kinvolk / racker

rack provisioning utility for Kinvolk projects
Apache License 2.0
14 stars 3 forks source link

bootstrap: show provisioning progress and failures #53

Closed pothos closed 3 years ago

pothos commented 3 years ago

The different stages of provisioning should be displayed to the user to understand where an error happens and be able to react on it. For this purpose the state of provisioning is queried by looking at created marker files, whether the Kubernetes API is reachable, and which nodes have registered itself to the cluster. This is combined with the return code of lokoctl (or terraform for plain Flatcar). An additional sanity check is done at the beginning to see if the BMCs are correctly configured before even attempting a PXE boot. The user is pointed to error logs and debugging commands (a new ipmi helper subcommand "diag" can show the server summary), and the user gets the choice offered to exclude a problematic server so that the cluster can still come up. Later on the user can see similar information in a new "racker status" command that shows what was provisioned and what not did not work.

(Progress is animated with ., .., ...)

$ racker bootstrap --onfailure=exclude -- -provision lokomotive -ip-addrs "$(cat ip_addrs)" 
➤ Checking BMC connectivity (35/35)... ✓ done
➤ OS installation via PXE (20/35)... × failed
Failed to provision the following 15 nodes.
11:11:a1:19:fb:22 11:11:da:7f:9d:02 11:11:da:7f:9d:5a […]
You can see logs in /home/core/lokomotive/logs/2021-04-06_16-21-51, run 'ipmi <MAC|DOMAIN> diag' for a short overview of a node, connect to the serial console via 'ipmi <MAC|DOMAIN>', or try to connect via SSH.
Something went wrong, removing 15 nodes from config and retrying 1/3
➤ OS installation via PXE (20/20)... ✓ done
➤ Kubernetes bring-up... ✓ done
➤ Cluster health check (20/20 nodes seen)... ✓ done
➤ Lokomotive component installation... ✓ done
$ racker status
Provisioned: Lokomotive
Kubernetes API reached: yes

MAC address        BMC reached  OS provisioned  Joined cluster   Hostnames
11:11:a1:19:9c:82   ✓       ✓       ✓    lokomotive.k8s.localdomain lokomotive-etcd0.k8s.localdomain lokomotive-controller-0.k8s.localdomain
11:11:a1:19:a2:5a   ✓       ✓       ✓    lokomotive.k8s.localdomain lokomotive-etcd1.k8s.localdomain lokomotive-controller-1.k8s.localdomain
11:11:a1:19:a2:82   ✓       ✓       ✓    lokomotive.k8s.localdomain lokomotive-etcd2.k8s.localdomain lokomotive-controller-2.k8s.localdomain
11:11:a1:19:fb:22   ✓       ×       ×    
11:11:da:7f:9d:02   ✓       ×       ×    
11:11:da:7f:9d:0a   ✓       ✓       ✓    lokomotive-worker-2.k8s.localdomain
11:11:da:7f:9d:22   ✓       ✓       ✓    lokomotive-worker-3.k8s.localdomain
[…]