tinkerbell / tink

Workflow Engine for provisioning Bare Metal
https://tinkerbell.org
Apache License 2.0
910 stars 133 forks source link

Add ability to reboot the machine after workflow is finished #71

Closed invidian closed 1 week ago

invidian commented 4 years ago

For workflows, which provision the OS, it would be nice if the workflow itself could reboot the machine, after it's done, so the machine can boot itself into target OS, so the upper orchestration system (e.g. person who monitors provisioning process, some kind of logic which use IPMI etc.) don't need to care about that.

Things to consider:

rgl commented 3 years ago

it seems that we now have a documented way to do a reboot from an action at https://docs.tinkerbell.org/actions/action-architecture/#namespace:

When an action attempts to do these steps in a container in its own namespace, nothing will occur as PID 1 is usually the process in the action container. To allow the expected behaviour an action can use pid: host in its configuration, this will mean that the action processes will be amongst all of the processes on the host itself (including the "real" PID 1). With the action in the host process ID namespace both a reboot or kexec will be able to work as expected.

It this issue about improving on that?

thebsdbox commented 3 years ago

This is fixed in tink-worker. This can probably be closed! 😀

rgl commented 3 years ago

@thebsdbox, by fixed, you mean using an action with pid: host?

having a docs example on how to reboot from a workflow would also be really nice :-)

I found a reboot example at https://docs.tinkerbell.org/deploying-operating-systems/examples-win/#creating-a-reboot-action-dockerfile:

FROM busybox ENTRYPOINT [ "touch", "/worker/reboot" ]

is that it? we just need to create a new file named /worker/reboot?

rgl commented 3 years ago

Creating a file named /worker/reboot does not trigger a reboot from tink-worker:

Screenshot_rpi-tinkerbell-vagrant_bios_worker_2021-05-26_09:12:14

Here's the workflow status:

+----------------------+--------------------------------------+
| FIELD NAME           | VALUES                               |
+----------------------+--------------------------------------+
| Workflow ID          | be378bb1-bdf9-11eb-9be0-0242ac120005 |
| Workflow Progress    | 100%                                 |
| Current Task         | hello-world                          |
| Current Action       | reboot                               |
| Current Worker       | 00000000-0000-4000-8000-080027000001 |
| Current Action State | STATE_SUCCESS                        |
+----------------------+--------------------------------------+
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+
| WORKER ID                            | TASK NAME   | ACTION NAME | EXECUTION TIME | MESSAGE                         | ACTION STATUS |
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+
| 00000000-0000-4000-8000-080027000001 | hello-world | reboot      |              0 | Started execution               | STATE_RUNNING |
| 00000000-0000-4000-8000-080027000001 | hello-world | reboot      |              0 | finished execution successfully | STATE_SUCCESS |
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+
thebsdbox commented 3 years ago

Ah this needs hook.. hook has the logic to watch for the reboot.

displague commented 2 years ago

Can we use sysrq-r from an action? https://hub.docker.com/r/mlafeldt/sysrq/ for example.

the action or task can't trigger a reboot by itself, as this will shut down the worker and it won't be able to report that reboot task succeeded

Does the action need to be Tinkerbell specific and act as the worker to signal success?

double-p commented 2 years ago

Built a docker image as per the example @rgl mentioned here already to no avail:

The "touch" is going nowhere and thus the rebootWatch() never fires.

A manual touch in the getty container to "/run/worker/reboot" works, so the watch is active. Just looks the volume mapping is wrong? (/worker:/worker)

Edit: it works; just the workflow was hanging somehow. recreated that and works as advertised: -build docker image as in the windows example -tag+push to local registry -add the action as in the same example

profi...reboot :)

yeahdongcn commented 1 year ago
  - name: "reboot into Windows"
    image: reboot:latest
    timeout: 90
    volumes:
    - /worker:/worker

I encountered the same issue in rebooting into Windows, the action failed (STATE_FAILED). Is there any place I can lookup for the error message?

yeahdongcn commented 1 year ago
  - name: "reboot into Windows"
    image: reboot:latest
    timeout: 90
    volumes:
    - /worker:/worker

I encountered the same issue in rebooting into Windows, the action failed (STATE_FAILED). Is there any place I can lookup for the error message?

It turns out the document is incorrect. I just sent out a PR to fix it.

chrisdoherty4 commented 1 year ago

We intend on drawing up a proposal for embedding restart capabilities into workflows so we don't need to rely on actions. This will compliment a want to see workflows consistently transition to an end state which doesn't happen if the restart beats the restart actions update currently.

chrisdoherty4 commented 10 months ago

https://github.com/tinkerbell/roadmap/issues/29 will see this come to fruition.

jacobweinstock commented 1 week ago

While tinkerbell/roadmap#29 will add builtin capabilities for rebooting, https://github.com/jacobweinstock/waitdaemon can achieve this from an action and still allow the Workflow to report successful.

I'm going to close this. If https://github.com/jacobweinstock/waitdaemon is not an acceptable solution please watch tinkerbell/roadmap#29.