Closed glimchb closed 8 months ago
so, I took a cursory look at the pull request. Just a few comments to get started:
One more comment: before I run any command on the DPU BMC, I need to check the server is up and running, by calling a redfish command on the server BMC. Otherwise, all my DPU BMCs will fail as the DPU has no power. I tend to leave all these external dependencies out the individual roles, rather than including them in my roles.
so, I took a cursory look at the pull request. Just a few comments to get started:
- I do not think that downloading the file from a repository into /tmp should be part of this task, but something it should be done separately. The update_bmc_fw should receive a filename/location of the image to install and that's it. Think you're updating a lot of dpus; you want to download it once. The datacenter might not have external connectivity, and will probably have a repository where all the valid fw are located, signed, etc. We should point to that location
I think we should support both, if the file path is a url then download if it is a path then expect it to exist there
- we're delegating to localhost to run those commands. This should probably a {{ delegate_to }} so we can have multiple nodes pushing images
- we're hardcoding MultipartHTTPPushUpdate. We do not support that yet in Bluefield. We might also want SimpleUpdate
Looking at the recent Docs I think we do, see https://docs.nvidia.com/networking/display/bluefieldbmcv2401/bmc+management#src-2571331277_BMCManagement-BMCUpdate
- retries/delay should also be variables. Different vendors might have different requirements
I would keep it hardcoded for simplicity, in case a vendor will come with a request we can always add it
- I see you have the cec in vars. In BF, a fw update implies posting two files: the bmc fw and the cec (eROT) and then reboot. This second step is missing in the pull request (it is also a same post with a different file), but it is also something very particular to us. Maybe the update/wait can be done on a list of files?
I think CEC should be a different role... CEC requires rebooting the server
Signed-off-by: Boris Glimcher Boris.Glimcher@emc.com