kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
321 stars 49 forks source link

Make Bootkube re-runnable #403

Open johananl opened 4 years ago

johananl commented 4 years ago

Right now if a cluster bootstrap process fails due to Bootkube being unable to complete for any reason, rerunning lokoctl cluster apply results in an error:

module.packet-johannes-test.null_resource.bootkube-start: Creating...
module.packet-johannes-test.null_resource.bootkube-start: Provisioning with 'remote-exec'...
module.packet-johannes-test.null_resource.bootkube-start (remote-exec): Connecting to remote host via SSH...
module.packet-johannes-test.null_resource.bootkube-start (remote-exec):   Host: 147.75.100.105
module.packet-johannes-test.null_resource.bootkube-start (remote-exec):   User: core
module.packet-johannes-test.null_resource.bootkube-start (remote-exec):   Password: false
module.packet-johannes-test.null_resource.bootkube-start (remote-exec):   Private key: false
module.packet-johannes-test.null_resource.bootkube-start (remote-exec):   Certificate: false
module.packet-johannes-test.null_resource.bootkube-start (remote-exec):   SSH Agent: true
module.packet-johannes-test.null_resource.bootkube-start (remote-exec):   Checking Host Key: false
module.packet-johannes-test.null_resource.bootkube-start (remote-exec): Connected!
module.packet-johannes-test.null_resource.bootkube-start (remote-exec): mv: cannot stat '/home/core/assets': No such file or directory
module.packet-johannes-test.null_resource.bootkube-start: Still creating... [10s elapsed]
module.packet-johannes-test.null_resource.bootkube-start (remote-exec): Job for bootkube.service failed because the control process exited with error code.
module.packet-johannes-test.null_resource.bootkube-start (remote-exec): See "systemctl status bootkube.service" and "journalctl -xe" for details.

Error: error executing "/tmp/terraform_28762278.sh": Process exited with status 1

FATA[0057] error applying cluster: failed checking execution status: exit status 1  args="[]" command="lokoctl cluster apply"

It could be much nicer to make the Bootkube part of our stack idempotent as well. This will likely require changes to Bootkube itself. Relevant upstream issue: https://github.com/kubernetes-sigs/bootkube/issues/700

invidian commented 4 years ago

Hm, I think this should have new upstream issue opened, as it heavily involves bootkube.

I would also suggest to specify, what should happen, when bootkube is re-executed, as I guess the expectations may vary.

pothos commented 3 years ago

In my case the problem is that /home/core/assets was there but the target already existed and had to be cleaned up manually first. I think this mv logic on our side can be made more robust (maybe this interaction of file provisioner and the remove-exec could be simplified, too).

invidian commented 3 years ago

In my case the problem is that /home/core/assets was there but the target already existed and had to be cleaned up manually first. I think this mv logic on our side can be made more robust (maybe this interaction of file provisioner and the remove-exec could be simplified, too).

Bootkube will fail when resources already exist, which is a major problem to solve, if we want to solve it. Moving files around is a nit.