[v2.5] batch flash nodes

svenrademakers commented 5 months ago

Is your feature request related to a problem? Please describe. The Turing Pi can flash an OS image to a given (supported) module. The firmware loads a USB plug onto the module, which, in turn, exposes an API used to write the new OS images. On v2.4 boards, only one module can be switched to the BMC at a time. On v2.5, we replaced these muxes with a USB hub, which opens up the possibility of flashing multiple nodes simultaneously.

Describe the solution you'd like We want to be able to write an image to a selection of nodes. The dropdown in the "flash" tab of UI gets replaced with checkboxes, which the user can use to select which nodes to flash simultaneously. Flashing different images concurrently to different nodes is out of scope. Keep it simple!

if an error occurs with one of the nodes, all other tasks are aborted as well.
error messages need to be altered so it's clear to the user which node caused the error.
All flashing features should be expanded to the other nodes as well. sha265 checking, skip crc bool and xz decompressing. Be mindful that we are extremely limited on memory. for instance, don't decompress the same OS image multiple times.

additional information we expect changes in the following 2 repos:

BMC-UI
BMCD

barrenechea commented 5 months ago

I can prepare the UI so we're able to support this use case, I'll be playing with options there 😄

barrenechea commented 5 months ago

To know, the UI should behave differently depending on if the board is <= v2.4 (or >=v2.5). Am I right? Could I get the board revision to render options conditionally? I think a good endpoint would be the one currently providing data for the About tab

I think that a good option would be for v2.4 users only to be able to pick one option (and automatically disable the user from picking more than one choice), and if the board is >=2.5, for it to not have that "disabled after one". That way, the experience would be similar for all users, and v2.5 boards could pick many nodes.

svenrademakers commented 5 months ago

You brought up a good point. Of course, this behavior should only occur when a 2.5+ board is detected. You're also right that we need an endpoint to detect which of the 2 versions needs to be loaded. I would prefer to have a field encoded in the actual flashing endpoint that specifies something like:

{
 can_do_bulk_flashing: true
 ...
 }

Making the code dependent on the firmware version is a less clean option as we make ourselves dependent on this specific hardware when in theory, it doesn't matter on which hardware it runs.

I think that a good option would be for v2.4 users only to be able to pick one option (and automatically disable the user from picking more than one choice),

that sounds good to me. it will keep things consistent!

barrenechea commented 5 months ago

I wonder if there is a chance to make the multiple selection of nodes work on 2.4... It may not be possible to flash them all simultaneously, but if we could flash them in sequence, the UI would work for both boards (just that v2.5 would be up to four times faster).

I could do a workaround on the frontend (to "send" flashing requests in sequence after one finishes), but if the backend could handle it, we could handle all the flashing sequence with a single image upload.

MPC-GH commented 5 months ago

The BMC itself doesn't have a lot of storage or ram, so you would be reliant on there being an SD Card of sufficient size in place if you were sequential flashing without re-streaming the image over the network repeatedly. Seems complex to do nicely in the web interface.

Would we want to consider caching before flashing anyway if there's a suitably large SD card in place from a reliability perspective? I can certainly see some use cases (remote or hard to physically access setups) where you may not want to risk a network drop mid-flash. For my use cases, I probably wouldn't be using the GUI at that point if I'm honest, but a locally saved image and the command line tooling.

barrenechea commented 5 months ago

@MPC-GH Yeah you're right, it probably streams the uploaded file directly to the target node(s). Better to keep it simple for now so we don't delay the main feature.

@svenrademakers a question regarding the /api/bmc?opt=set&type=flash call. Currently, it expects something like: /api/bmc?opt=set&type=flash&file=ubuntu.img&length=55345150&node=0 (node being 0-indexed)

Would it make sense for this to send a comma-separated list in the node field for bulk flashing? Something like: /api/bmc?opt=set&type=flash&file=ubuntu.img&length=55345150&node=0,1,2,3

svenrademakers commented 5 months ago

@barrenechea, I would like to keep the API backward compatible as much as possible. Therefore, it would be better if we introduced an additional key (it's not more elegant by all means). Maybe copy or batch is the right word?

e.g. /api/bmc?opt=set&type=flash&file=ubuntu.img&length=55345150&node=0&batch=1,2,3

barrenechea commented 5 months ago

e.g. /api/bmc?opt=set&type=flash&file=ubuntu.img&length=55345150&node=0&batch=1,2,3

I like batch! I followed it to the teeth 😄 my draft PR is currently handling it with the following cases:

For all v2.4 boards (and v2.5 clicking a single node): node=0 (no batch parameter) [v2.5 only] Nodes 1,2,3,4 clicked: node=0&batch=1,2,3 [v2.5 only] Nodes 1,3,4 clicked: node=0&batch=2,3 [v2.5 only] Nodes 2,4 clicked: node=1&batch=3

Note that I'm ordering the node IDs on the client, meaning:

If the user clicks first on Node 4 and second on Node 1, the payload will be: node=0&batch=3

And not in the order the user clicked, like: node=3&batch=0 <- This will not happen

It's just an Array.sort I'm doing before sending the request. If irrelevant, I could clean it up and save some CPU cycles on the front end 🤣

We'll see how it goes, but we have something to play with!

turing-machines / BMC-Firmware

[v2.5] batch flash nodes #201