New Repositories for Firmware Update

jagpalgill commented 3 months ago

Hi,

I am bringing up this issue to gather input from TOF on the best approach for organizing the repositories for firmware updates of different components for implementing design. Currently, the code for firmware updates is located in the following repositories:

phosphor-bmc-code-mgmt - BMC firmware update
pldm - Components using PLDM packaging and PLDM protocol

We have previously discussed some of these aspects, so I will summarize some of the options:

Organize repositories based on device transport level, with devices using the same transport layer protocol residing in the same repository (e.g., all devices using PMBus can be in a repository called phosphor-pmbus-code-mgmt, which is more consistent with the current pldmd).
Organize repositories based on device type, such as phosphor-vr-code-mgmt, phosphor-code-mgmt, etc.

Please provide your input on any other options and which direction we should take for this?

amboar commented 3 months ago

I feel like we shouldn't have separate repositories for the most part, unless there's a compelling reason to have the functionality split (e.g. implementing PLDM firmware update in pldmd doesn't seem unreasonable). We can implement firmware update as multiple applications in one repository, or even multiple backends to the one application, right (and that's before the observation that application != process)?

williamspatrick commented 3 months ago

Are there any cases where we are expecting the need to link against device vendor libraries or are all of the updates going to be self-contained?

When we talk about update for assorted devices, should we focus exclusively on update or are there other management operations that need to be done to those devices, like failure detection and analysis?

jagpalgill commented 2 months ago

I feel like we shouldn't have separate repositories for the most part, unless there's a compelling reason to have the functionality split (e.g. implementing PLDM firmware update in pldmd doesn't seem unreasonable). We can implement firmware update as multiple applications in one repository, or even multiple backends to the one application, right (and that's before the observation that application != process)?

@amboar It is my understanding that many common functionalities are already defined through phosphor-dbus-interfaces. For devices with similar access types, there will be shared backend code, and therefore they can be part of the same repository to facilitate this sharing.

Are there any cases where we are expecting the need to link against device vendor libraries or are all of the updates going to be self-contained?

@williamspatrick The decision on whether to use an existing open-source library or to create a new one in OpenBMC will likely depend on the specific device and vendor. In some cases, a vendor may choose to link with their own library if it is already available in open source, rather than duplicating the code in OpenBMC.

When we talk about update for assorted devices, should we focus exclusively on update or are there other management operations that need to be done to those devices, like failure detection and analysis?

@williamspatrick Failure detection is a crucial aspect of the update flow. The current Redfish Message registries (https://www.dmtf.org/sites/default/files/standards/documents/DSP2065_2024.1.pdf) already provide information on whether an update has passed or failed and, if it failed, at which phase it did so (e.g., Transfer failed, Activate failed, Apply Failed, Verify Failed). However, clients may require further analysis beyond this basic information. For example, they may want to use journalctl logs to investigate the cause of an Apply failure for an i2c device, such as an i2c bus being busy or timing out.

williamspatrick commented 2 months ago

@jagpalgill

Failure detection is a crucial aspect of the update flow.

I was meaning failure detection separate from the update flow.

Take example of VRs. We have the following functionality (at a minimum):

Sensors from the VR.
Fault reports from the VR (and analysis).
Firmware update of the VR.

Additionally, while we are doing firmware update, the other two features go offline.

If we try to do those 3 different features in 2-3 different applications (maybe in 2-3 different repos even), we end up with state that needs to span those applications. Something along those lines was proposed here in dbus-sensors (https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/72009) and rejected do to the state spread.

[Ignoring a "mono-application"] there are two primary design points we could consider: have applications/repos that target a function (ie. code update) or have application/repos that target a device/protocol (ie. pmbus). Since we know there are likely state implications between the function-domains, it seems to me to be a poor choice to target function-domains with our applications/repos. Assuming we use the generated server bindings, it seems to me like there should be very little code sharing between "code update on VRs" and "code update for BIOSs" and so I don't know why we would want to target "*-code-management" as the repository split choice.

(*) VR = Voltage Regulator

jagpalgill commented 2 months ago

@williamspatrick Yes, I agree the example you mentioned is a good case of why keeping state machine for a particular device/protocol in one application is better choice. Having applications/repos that target a device/protocol overall seems like a good option.

Can other TOF members also provide their input? Thanks

jagpalgill commented 2 months ago

@edtanous Any thoughts?

jagpalgill commented 2 months ago

Based on the above discussion, I will proceed with opening a TOF issue for repository creation using the following names:

phosphor-pmbus
phosphor-cpld
phosphor-nor

@williamspatrick @amboar, please let me know if the nomenclature seems fine to you.

amboar commented 2 months ago

phosphor-pmbus maybe, but I'm not at all convinced about phosphor-cpld or phosphor-nor. They just seem too ill-defined. There are no constraints on CPLD behaviour that would make it coherent. NOR, well, it's just storage; what really matters is how it's being used, which again, there aren't many constraints on that.

jagpalgill commented 2 months ago

There are no constraints on CPLD behaviour that would make it coherent. NOR, well, it's just storage; what really matters is how it's being used, which again, there aren't many constraints on that.

@amboar I am not sure if I fully understood the above, but my understanding is that you are suggesting that the names "phosphor-cpld" and "phosphor-nor" are too generic in their purpose. If that is true, then how about naming them "phosphor-cpld-code-mgmt" and "phosphor-nor-code-mgmt" to specifically target code updates for CPLD and NOR devices?

amboar commented 2 months ago

I am not sure if I fully understood the above, but my understanding is that you are suggesting that the names "phosphor-cpld" and "phosphor-nor" are too generic in their purpose.

My concern isn't really with the names (though I think they are a bit vague). I think that the more important aspect is how something is used, not the technology used to implement it, but your proposed scheme targets specific technologies.

For example, PMBus, while it exploits SMBus/I2C for a transport, is largely about defining a common interface for a collection of devices that have similar behaviours. This is a good starting point because its the behaviour ("how you use it") that is specified: You're not proposing phosphor-i2c (which doesn't try to specify anything about endpoint behaviours beyond how to communicate with them). Conversely, the behaviour of a CPLD is relatively arbitrary, they have their niches but they're not really all that constrained in what gets done with them. There's no spec for their behaviour around which to build something coherent. That's my concern, and it's the same for phosphor-nor: There's no one specification describing what the NOR is used for or how it's laid out, nothing around which you could build something coherent.

jagpalgill commented 2 months ago

I think that the more important aspect is how something is used, not the technology used to implement it, but your proposed scheme targets specific technologies.

My understanding is that mostly CPLDs vendors will have their own proprietary way to perform updates. So, let's say if there is cpld vendor who want to contribute their firmware update process to openBMC, where would that go?

For NOR, my understanding is that for MTD devices mostly flashcp sort of update flow will be used, which provides some sort of common pattern and makes them a good candidate under one repo?

amboar commented 2 months ago

My understanding is that mostly CPLDs vendors will have their own proprietary way to perform updates. So, let's say if there is cpld vendor who want to contribute their firmware update process to openBMC, where would that go?

We have two axes, "what it's used for", and "how it's updated"; @williamspatrick suggests above that applications dealing with "what it's used for" also need to know "how it's updated" to avoid spreading the state across multiple processes, but I think the difficulty is we can't expect a 1-to-1 mapping from "how it's updated" to "what it's used for". That to me suggests it might best to put the update strategies in a library against which relevant applications can link. I don't think that the observation is CPLD-specific, so I don't think that phosphor-cpld is appropriate? It seems like an arbitrary boundary.

or NOR, my understanding is that for MTD devices mostly flashcp sort of update flow will be used, which provides some sort of common pattern and makes them a good candidate under one repo?

Sure, flashcp is a likely implementation strategy but I think a bigger concern is the behaviour of the partitions for a given context. I don't imagine that the contexts will have any specific relationship to each other that makes them coherent. Can you point me to some concrete use-cases that we're considering? Also in the context of the CPLD discussion above, are these really any different to proprietary update methods? What drives putting the implementation in a distinct repository (phosphor-nor)?

jagpalgill commented 2 months ago

That to me suggests it might best to put the update strategies in a library against which relevant applications can link. I don't think that the observation is CPLD-specific, so I don't think that phosphor-cpld is appropriate? It seems like an arbitrary boundary.

It seems reasonable to keep processing and update state within the same application. However, I am not sure why a library approach was not used in the past. I suppose it may be helpful to gather feedback from other TOF members on this approach. Additionally, if we decide to use libraries, we will still need to determine where the code should reside.

but I think a bigger concern is the behaviour of the partitions for a given context. I don't imagine that the contexts will have any specific relationship to each other that makes them coherent.

I believe that any differences in devices or vendors can be accommodated through configuration settings, allowing us to maintain a common orchestrator logic.

Can you point me to some concrete use-cases that we're considering?

On Meta side, we have flash based upgrades for BIOS, PCIe switch.

Also in the context of the CPLD discussion above, are these really any different to proprietary update methods? What drives putting the implementation in a distinct repository (phosphor-nor)?

Many of the common components are already being abstracted through generated sdbusplus bindings. While these methods may be proprietary, grouping them logically can help prevent a "kitchen sink" situation in the future.

amboar commented 2 months ago

grouping them logically

That's the bit we're debating though: what is a logical grouping in this context? My concern is that grouping by device class isn't necessarily helpful, but I'm not planning to die on that hill either.

mdmillerii commented 2 months ago

I'll add that I agree that nor is not a sufficient categorization. Looking at just the existing storage layout and updates it might only work for one of the four layouts.

The base layout requires that the file system be unmounted before flashcp is run, and then that only specifica partitions be updated. This is slightly relaxed when there is a separate partition for a/b image selection be it via hardware selecting a second chip or flipping an address or chip select but it still requires knowledge of what section or partition to update, which is even more specialized.

The noroot layout can be updated blindly but expects an independent storage for /var

The ubi layout requires special tools and the the emmc doesn't use nor directly but a full controller with FTL or equivalent.

What does it mean to update a cpld is it sending an image to a spi nor? What handshake is required to access the storage? Or is it downloading over i2c or some other bus?

I did some research on some update framework 8-12 months ago when code update came up. One or two could have been integrated into the prior process (I have not been following the project closely in the last 6 months for personal reasons to know if this is still true). One or two concentrated on image distribution and validation and forcing a/b update but only had the concept of updating a sub image (typically a/b( or including it as part of the whole image but didn't handle having the image in storage and managing downloading to another device (eg the external Ethernet or storage controller with storage or host BIOS). The pmci update is fairly well defined as well as ncsi update although over pci vdm vs i2c is a limitation not all systems support PCI vdm.

jagpalgill commented 1 month ago

What does it mean to update a cpld is it sending an image to a spi nor? What handshake is required to access the storage? Or is it downloading over i2c or some other bus?

For Meta, we use Altera and Lattice cplds. Altera one uses i2c interface whereas lattice uses jtag.

One or two could have been integrated into the prior process

Are you referring to the "phosphor-bmc-code-mgmt" repository? This repository supports various BMC layouts, such as static, ubi, etc However, my understanding is that the code within this repository is very specific for BMC upgrades. I would let Adriana comment on that since she is one of the maintainers. (CC @williamspatrick).

mdmillerii commented 1 month ago

One or two could have been integrated into the prior process

Are you referring to the "phosphor-bmc-code-mgmt" repository? This repository supports various BMC layouts, such as static, ubi, etc However, my understanding is that the code within this repository is very specific for BMC upgrades.

In my prior job I actually did a lot of the prototyping of each of the BMC layouts (except norootfs) and the initial update although I left the dbua and redfish interactions to others. So I'm familiar with the scope

I was trying to point out that the flashcp step is actually the easiest and the preparation be it transition locks, a/b side selection, unmount of image, etc is the hard part. And it's not clear what you think is the scope for a xxxx-nor repository but it would appear to apply to 2-3 of the existing bmx layouts if the only criteria is stored in nor flash. Also flashcp doesn't offer programmatic access to the progress which should be a requirement .

So I'm looking for a better definition of the proposed scope of the repositories.

jagpalgill commented 1 month ago

And it's not clear what you think is the scope for a xxxx-nor repository but it would appear to apply to 2-3 of the existing bmx layouts if the only criteria is stored in nor flash.

phosphor-nor is not being proposed for BMC but for other nor based devices such as PCIe switches (for Meta use case as mentioned above).

So I'm looking for a better definition of the proposed scope of the repositories.

If not phosphor-nor then in which repo you think the code update for such devices should reside?

mdmillerii commented 1 month ago

phosphor-nor is not being proposed for BMC but for other nor based devices such as PCIe switches (for Meta use case as mentioned above).

How do you know it's nor? Do you have a mux with bit bang access? Is it attached to gpios driven by the BMC? A kernel command level driver wilth command byte streams? Both of these could be kernel MTD drivers but how is access obtained from the device being updated?

Why is being nor significant to the update process?

Conversely aspeed includes a separate controller intended for BIOS and show both mux design with internal (25MHz ?) and external switches in addition to Roxy access from the firmware. Most of these require coordination to determine to disable host access+be it in host soc controller notification or just host reboot block) and then changing the mux (control register under pinmux or gpio to external mux) followed by a bind of the kernel driver to probe the device and establish the capacity mode etc The actual fladhcp is the easy part and not worthy of being the attribute defining the repository choice (although a progress report might be useful). What criteria would be used to exclude them? Or should they included?

Intel i210 Ethernet controller also uses an attached nor but access is through the host firmware driver and contains nultiple images with separate signatures and others configuration data such as MAC address that should be excluded from code signatures or signed as per machine data. I'm aware of cpld from 20+ years ago where the configuration bitstream was loaded from a spi nor (maybe limited to 3 byte commands) but the remaining was accessible to the application. Often a driver would expose this to the host. Again, how is being nor relevant? The access is at a higher level. (And these are like xylinx or altera devices so still relevant today).

I'm aware of at least one PCIe switch with a private nor attached that was configured over an i2c variant with their own command access and sequence. Would this be included or excluded?

Conversely pldm describes the command sequences the query and supply firmware images and even transport root of trust (ROT) signatures. But that already has a home.

mdmillerii commented 1 month ago

n addition to Roxy access from the firmware.

Correction: In addition to a command proxy access to the firmware

... where LPC firmware cycle {and eSPI? I never saw eSPI specifications) reads are managed by hardware but writes require stopping reads and a controller driver to adjust multiple registers in addition to controller register access (which exposed pants down CVE).

Over time the BIOS direct SPI access was replaced with paging blocks into a RAM buffer as/when initiated by ipmi commands and the blocks were later moved into files in the BMC filesystem then migrated to eMMC storage transparently to the host BIOS. The choice of nor or eMMC or other block storage again is not relevant to any of the update flow.

Reading the history again I see you mentioned i2c and jtag access for different cpld manufacturers but didn't specify how access was obtained, what host states and action blocking was required, nor why they should be excluded from the nor repository when they are using a nor part for storage. To me the jtag driver might even give bit bang access to the pins or it might make use of some kind of JTAG command mapping. iEE1149 (JTAG , number from memory) defines a State machine and two of three required commands to implement manufacturing control of pin state (later more were defined) but also a variable length of 1 2 to infinite bits (I know I implementiibs with 3-5, 8, 32 and proposal for 80) for the "instruction" register selecting an arbitrary but length defined per instruction data register and protocol states for the above to become in effect. it also architects shared lines and both parallel and serial connection of multiple devices with maximal pin sharing from the controller to the devices facilitated. A slow bitbang could drive the nor directly through boundary scan but I wouldn't be surprised to find a jtag access to higher level access be it to command sequences or address based storage. The first two could be kernel MTD but the third would not , the second night require delays and the first two require significant device and multi device topology knowledge. (The JTAG standard also describes a definition file for board test generation programs to consume and generate sequences for manufacturing test).

Again the transport and manufacturer selection is not relevant but the functional handshake from normal operation mode and what is available is significant. (Can I access i2c sensors on motherboard from runbmc? Serial? Or just the host boot)?

williamspatrick commented 1 month ago

How do you know it's nor?

@mdmillerii - I'm not sure what you're looking for here. A lot of what you talk to are hardware implementation options, but not everyone chooses the same hardware implementation. We have hardware designs where the BMC has control over a mux to a NOR device, which is used to store the BIOS and/or PCIe Switch firmware. I'm don't think this is that bizarre of a design. We'd like find a place to hold that code so we can contribute support upstream.

mdmillerii commented 1 month ago

We have hardware designs where the BMC has control over a mux to a NOR device

This is the first time I have seen stated the scope or qualifications to be updated by this proposed repository.

This still leaves open what prec- and post- onditioning are needed and I interlocks are needed before applying any said update. Does the system need the host qiiesced? Powered? Partially left in reset?

I could understand a cpld-altera (example) that interfaces with a standard vendor interface easily but still have concerns about what's needed to handle the leftover application space.

Is another requirement that the whole device be flashed and image supplied as a single image? Is application space always part of the distribution image or should only enough space for the supplied image be erased and written? I could see both options for BIOS (and cpld). How much does one choose the erase block size if separate?

Leveraging the kernel to synthesize the SPI traffic can result in further scope reduction although a paced %done copy and verify with support to instantate bind the driver with generic gpio conditioning may be sufficient commonality. That said, erase followed by writing or interleaving erase and write would seem to be minor additions to a common progress reporting copy.

Today the code update repository hosts scripts that actually apply the code. I can understand wanting a place to host and hold the code that is a step up from scripts and flashcp with static progress percentages based on some engineering assessment expected time for each step. But like the others I'm not convinced this is just creating more repositories for things that just end up needing even more components.

mdmillerii commented 1 month ago

Wondering if we should create a prototyping repository with subdirectories that could host gerrit reviews develop and prototype solutions such as this which doesn't have a public example to justify the grouping? Phosphor misc is more towards usable code and skeleton is ancient python that is becoming obsolete.

This is possibly a way to show and develop code to allow natural grouping to be identified to cultivate finding common topics for consolidated repositories, which I believe this is request process is intended to encourage.

mdmillerii commented 1 month ago

Stated differently, code update requires

Obtaining the image
Validating the image eg signatures, purpose, allowed targets
Schedule image for transfer to the device
Preparation for Transferring the image to the device
- locking out conflicting operations
- preconditioning to establish access (power, reset hold or release, gpio or other mux select,
- connection of physical interface / binding kernel driver to device
formatting commands to effect the transfer (SPI, i2c, JTAG, PLDM message, etc)
- verify transfer / read back?
post conditioning (undo part of 4 above)
Request image be activated (become the default)
Releasing locks/ undo remaining 4
Make active (reboot? grouped with other images? Part of 8?)
End maintenance

The proposal does one step, the code update refactor does the first ~~two~~ three are has hooks to incoke step 7. Simple update of a single image may cause some steps to be implicitly combined or become no-op.

(The static MTD nor has code written and tested to improve availability and reporting but it's not enabled under Redfish and requires preconditioning before step 1 above although the traditional legacy fetch from resource could support it)

mdmillerii commented 1 month ago

Is pmvus update sufficiently common or should it just be interlocked with regulator support or power supply or ... (Does it need power supplied to reach standby? Just some interruptions power on? Load held in reset?). Even when the runtime interface is pmbus what is common?

williamspatrick commented 1 month ago

Is pmvus update sufficiently common or should it just be interlocked with regulator support or power supply or ... (Does it need power supplied to reach standby? Just some interruptions power on? Load held in reset?). Even when the runtime interface is pmbus what is common?

I don't know why any of this needs to be discussed here in order to determine if a repository should be made. These are all implementation level questions, aren't they? Some of this is "whatever the current hardware supported needs will be implemented" and if someone needs more, they write it. Isn't that how open source works? The original authors don't need to boil the ocean and we shouldn't be expecting them to just to get a repository created.

mdmillerii commented 1 month ago

https://github.com/openbmc/technical-oversight-forum/issues/37#issuecomment-2264062045

(Re cpld) I believe that any differences in devices or vendors can be accommodated through configuration settings, allowing us to maintain a common orchestrator logic

If the common part is the orchestrion and progress feedback why should nor get a different repository, when it's just a fancy flashcp ? Why not make it fancy application that can call out to vendor library to transfer a chunk of image? If it's handles multiple images to multiple destinations it can handle updating b side dual images too. Choosing ubi volume writing, eMMC storage, or other than block device, or MTD large block erase is similar in complexity to calling a different cpld vendor library to identify chunks and then transferring a limited number of them before a callback to report incremental progress,

To me such a transfer with progress framework could be prototyped in the code update repository even if we keep the preconditioning and target selection separate like the hooks today..

mdmillerii commented 1 month ago

why any of this needs to be discussed here in order to determine if a repository should be made

I'm just trying to get useful scope of work to help get an appropriate name. As I said, I think the transfer to a spi attached nor is the easy part and can be just a invocation of flashcp until more detailed progress is required. To me, The setup and teardown is more interesting and a significant portion of the unique work and I don't have any ideas how that becomes grouped by the data being stored in specific technology.

The original authors don't need to boil the ocean and we shouldn't be expecting them to just to get a repository created.

I 100% agree and wasn't suggesting that anything beyond their requirements be written. But I also think that the scope of rewrite of flashcp is not sufficient or interesting, and don't see the need to import and maintain a raw SPI attached nor library should be a goal when the kernel has a bit bang to instantate.

Maybe the setup and binding of the driver and device could be added to the scope but do we incorporate the interlocks and conditioning? What is the improvement over the existing callout to start a system unit for each step (which today invoke Exec= to a common script with arguments for a specific action step)?

mdmillerii commented 1 month ago

Stated differently, a single rewrite of flashcp is not interesting by itself and that seems to be the identified scope of code-update-nor. Otter scopes may be interest, but is it premature to create this one?

A driver perform that muxing (gpios? Pinmux? Disables conflicting devices and installing other to allow invocation of an i2c mux path?) then iexposes hotplug events as a bus dtiver is create and expos the update device is all kernel scope. (not that I'm trying to propose implementation for upstream acceptance being unlikely from past proposal experience)

Doing the same in user space requires an application make a gpio request which selects as needed and then truggers device discovery (hotplug? Binding? Callout script/unit?). Identifying the discovered device. Then invokeing (a modified?) flashcp and following up with the teardown

The progressive steps are the interesting part. The progress reporting copy is interesting, (but how to report multiple images? What % reserve for verification is it a full read how much faster than the initial write? In none of these does the storage mechanism appear to be unique (nor is the method of access relevant (maybe relative read write speed) , and in all a simple copy could suffice to start.

Then when the copy completes (how iis this determination made --Systemd doesn't like indefinite startup/one hot -- Is it required to signal or invoke a method? Or just invoke a completion unit or script? ) the driver needs to unbind, the device, possibly the driver and release the mux.

The acquisition of locks and release to prevent conflicting actions seems to be more related to function than where it's stored.

Other questions for the framework may include Do we need to persist hardware had an attempted but ncomplete update applied? Should we force health on reboot to trigger intervention to attempt again (when the image to be applied is not staged in BMC acesdible storage)? Do we download and validate the signature again (limited times?) Should we resume the attempt again assuming the fault/ restart was not from the application process itself but unrelated? (These don't have to be answered or implemented initially just brainstorming questions, all unrelated to transfer of the bits of the image but generic across code update. Some answers are evident (eg dual image stored, device fails to initialize -- but it's useful to remember update is progress failed) to improve diagnosis and suggest a recovery action). others might be policy especially if we initiate the redfish concept of maintenance windows to schedule image application.

williamspatrick commented 1 month ago

Stated differently, a single rewrite of flashcp is not interesting by itself and that seems to be the identified scope of code-update-nor.

I don't know where you got this from. There is zero interest on our part to rewrite flashcp.

Packaging, mux control and phosphor-state-management interaction (to ensure correct system state / dependencies prior to updating) are the primary facets here that need to be implemented.

It's hard to figure out what level of detail people are expecting to be documented before we start writing code. Can we stick to answering that question: What details does the community/TOF need before code starts? These long posts of all your ideas are not helpful unless you're actually going to contribute to writing the associated code (and even still this isn't the best forum for it). Other engineers are more than capable of thinking through the problem domain they are trying to develop in. At this stage it is just derailing the primary purpose of this issue.

mdmillerii commented 1 month ago

Speaking as a community member, When selecting a name for a new repository the expected scope is useful.

These long posts of all your ideas are not helpful unless you're actually going to contribute to writing the associated code (and even still this isn't the best forum for it)

Is there a gerrit review for a design open? A discord thread?

Packaging,

Do you mean image packaging and identification?

mux control and phosphor-state-management interaction (to ensure correct system state / dependencies prior to updating) are the primary facets here that need to be implemented.

Great. What makes storage in SPI nor significant for any of these? That it has to locate the SPI MTD when plugged? Some other current?

On the other a framework to process images, selecte, and initiate transfer is in phosphor bmc code update. It calls out to systemd units to perform image application.
I outlined what I thought was significant steps https://github.com/openbmc/technical-oversight-forum/issues/37#issuecomment-2303571650 and my understanding is you are requesting 4-7 with 5 being a csll out an existing program. 9 would be part of the binary. If removing the requirement to package in a tar file is a barrier I would not object to the target slot being offered the image for it to verify appropriately as needed.

I can try to help refactor to a more generic framework but I'm hardware limited and would be reliant on qemu and CI. (I dealt with update scripts and units at my former employment and left the interaction to others but have studied the code since then.)

williamspatrick commented 1 month ago

Is there a gerrit review for a design open? A discord thread?

There is an overall updated design that has been implemented in phosphor-bmc-code-mgmt for updating the BMC. The primary purpose of this design was to be able to handle updates of "other things" in a more robust way and aligned with Redfish.

https://github.com/openbmc/docs/blob/master/designs/code-update.md

This issue was to create the repositories for the "other things". Are we expecting a low-level design for each of the requested "other things"? I don't know but that seems a little excessive to me.

Packaging,

Do you mean image packaging and identification?

You need some way to identify that cpld.blob is applicable to a "Yosemite4 baseboard with an Altera CPLD" and not a "Minerva compute card with a Xylinx CPLD". For phosphor-bmc-code-mgmt this is done with the tarball manifest file. I think we'd (Meta) prefer to move towards PLDM packaging, but support for both can be handled if necessary.

mux control and phosphor-state-management interaction (to ensure correct system state / dependencies prior to updating) are the primary facets here that need to be implemented.

Great. What makes storage in SPI nor significant for any of these? That it has to locate the SPI MTD when plugged? Some other current?

Well, updating NOR devices can be delegated to flashcp and kernel MTD drivers, so that one is probably simpler. CPLDs typically have some proprietary i2c or JTAG communication protocols. The dbus design is intended to be such that each daemon primarily handles "how" on the update, which is different from target-to-target. Some of them will be simpler than others.

williamspatrick commented 1 month ago

On the other a framework to process images, selecte, and initiate transfer is in phosphor bmc code update. It calls out to systemd units to perform image application.

I don't think the systemd unit approach is especially robust. Errors and progress are not appropriate reported and it is difficult to "unwind" appropriately on error.

Maybe this is the bit that @jagpalgill needs to elaborate on in the "alternatives" section of the referenced design doc: could we simply write new "flashcp-like" applications and make phosphor-bmc-code-mgmt more generic?

jagpalgill commented 1 month ago

I don't think the systemd unit approach is especially robust. Errors and progress are not appropriate reported and it is difficult to "unwind" appropriately on error.

Maybe this is the bit that @jagpalgill needs to elaborate on in the "alternatives" section of the referenced design doc: could we simply write new "flashcp-like" applications and make phosphor-bmc-code-mgmt more generic?

Sure i can add a sub-section into Alternatives section of design for phosphor-bmc-code-mgmt enumerating some pros and cons.

williamspatrick commented 1 month ago

@jagpalgill Based on discussions in the @openbmc/technical-oversight-forum channel, it seems like the overall preference is to start writing the code in the phosphor-bmc-code-mgmt repository to get going. If we decide the code does not have much affinity for that repository, we can move it elsewhere.

cc: @anoo1

jagpalgill commented 1 month ago

Sure thanks, will use phosphor-bmc-code-mgmt repository for new applications.

williamspatrick commented 1 week ago

Closing for now due to recent discussion (using phosphor-bmc-code-mgmt).

openbmc / technical-oversight-forum

New Repositories for Firmware Update #37