Request for dynamic distinction between different boards' versions (i.e. "single binary" for many boards)

lmajewski commented 3 years ago

Problem Support for many variants of the same boards. For example - a board (foo) is added once to Zephyr project. Its HW has been described in foo.dts. After some time second revision of this board (PCB v2) has been introduced (with e.g. added extra LED, some extra devices attached to I2C) with foo_v2.dts. Ideally, I would like to have a single binary to support both variants.

Solution With U-Boot bootloader I can bundle many compiled dtb files (which correspond to dts files for many variants of the board) into FitImage and then write it to eMMC or SPI NOR. Then after power on I can use data from EEPROM to read board revision and then use proper DTB. When afterwards booting Linux one can also use FitImage or DTBO (overlays).

Alternatives/Workaround As noted in the doc [1] - the Zephyr will use foo.dts to parse it with some python script and provide header (zephyr/include/generated/devicetree_unfixed.h) with #defines describing nodes and properties (e.g. #define DT_N_S_soc_S_spi_4002d000_S_dsa_0_P_reset_gpios_IDX_0_VAL_pin 22). Then DT_* macros are used to get those values in drivers or user programs. As suggested on Slack - workaround would be to combine foo.dts and foo_v2.dts to create foo.dts, which will have HW description for both boards. Then in user program (or in drivers if needed) I could read the EEPROM to get board revision and use it to access foo v2 HW. It seems to be a good idea to use DT_ALIAS(i2c_ic_v1/v2) to get the proper node.

Question:

Is there a better way than the described above workaround (i.e. to have a single dts)?
Are there any plans to provide support for "single binary" functionality?
Has anybody tried to already solve this issue? Any thoughts or ideas to share?

Links [1] - https://docs.zephyrproject.org/latest/guides/dts/api-usage.html#a-note-for-linux-developers

carlescufi commented 3 years ago

Re: "Single-binary" functionality, you'd probably need: https://github.com/zephyrproject-rtos/zephyr/issues/11918, which is under discussion. In fact, "Runtime pin configuration" has been discussed in the following two contexts:

Interpreter-style runtime assignment of pins to peripherals (i.e. the CircuitPython use case)
Runtime selection of peripherals based on runtime detection of a board variant (what you describe in this issue), see this comment for example

lmajewski commented 3 years ago

@carlescufi Thanks for sharing those links. It looks like the #29990 has the biggest chances to be pulled soon.

erwango commented 3 years ago

Before running into "how we can make this possible?", I want to be sure we're all on the same page here: @lmajewski Zephyr binary is built against a device tree description. New device tree means new binary. 2 dts implies 2 binaries. So, none of the proposed solution or proposed alternatives would work there.

This being said, we can start looking for a solution. But is that a use case that Zephyr wants to target?

carlescufi commented 3 years ago

@carlescufi Thanks for sharing those links. It looks like the #29990 has the biggest chances to be pulled soon.

Yes, but that won't solve your problem entirely I believe. Please do spend some time reviewing it however, see if it covers your use case, and to what extent.

pabigot commented 3 years ago

In a limited form something like this is necessary. It ties to #19448 and suggests a need for runtime probing of hardware to determine which disabled but built and linked devices need to be started. Certainly an old binary can't support new hardware unless that hardware is self-describing (e.g. QSPI flashes using JESD216), but a new binary should be able to figure out what's available and do its job.

carlescufi commented 3 years ago

This being said, we can start looking for a solution. But is that a use case that Zephyr wants to target?

My understanding was that this was something we wanted to have limited support for, unlike the fully flexible CircuitPython-style functionality. But let's continue to discuss this here and reach a decision.

lmajewski commented 3 years ago

To provide some explanation (and please correct me if I'm wrong):

The #29990 would help with generating different binaries for different HW boards revisions (foo_v1, foo_v2, etc.)
Then it would be good to have a single binary - which would cover all variants with some runtime probing of underlying PCB revisions (with e.g. EEPROM).

Keeping in mind that Zephyr targets very constrained devices it seems feasible to:

Have per board revision images bundled together - either with simple cat [*] and header or have some more sophisticated format (like FIT image. However, for the latter one it would be needed to have DTB parsing (libfdt) implemented.
Have some kind of bootloader program to distinguish between revisions and read (or better execute in place - XIP) proper image. It would have only the code necessary for reading board revision and then setup XIP (start vector address remapping?) for proper image.

[**] - image types: -- Data -- Configuration -- Others

However, this has a thread - people will tend to add a lot of functionality to the bootloader and it will grow over time (as it was with U-Boot, SPL, TPL).

gmarull commented 3 years ago

If two boards have distinct pinctrl for the same peripheral I see no other way other than having 2 images. Zephyr uses DT in a static way, and while it has many advantages in terms of optimized binaries (size and runtime) it has its downsides. When it comes to devices (e.g. a sensor), one could have all of the required ones defined but disabled and then leave the application to decide which devices are probed (not possible today afaik).

I think there is room to improve certain scenarios, but without significant changes to the Zephyr architecture, I don't think we can expect the flexibility one has e.g. with dtb or dynamic Kernel modules in Linux.

lmajewski commented 3 years ago

For a handful of boards the approach with having a single DTS with all versions defined and then deciding in run time (in user's application) which one to use seems like a feasible approach.

However, I do think that sooner than latter there will be a use case where many boards' revisions are going to be supported and the user application will not be able to handle it. Hence, was my question - if anybody considered such a use case? Was there any preparation for implementation?

pabigot commented 3 years ago

Was there any preparation for implementation?

I don't think so. At some point we should get rid of the assumption that all devicetree data can be processed statically at build time, and just stick a DTB into the application so we can use devicetree properly. Personally I think we're rapidly approaching that point.

erwango commented 3 years ago

just stick a DTB into the application so we can use devicetree properly

As a coexisting alternative to current built time support you mean ?

eanderlind commented 3 years ago

It would be useful to scope/bound the problem. Is this limited to boards that share the same SoC? (SoCs tend to differ in terms of FLASH/RAM, so a shared binary may be infeasible for most practical cases even if most of dts is the same.

TBD if determination of board should be done in a centralized fashion (probe some device, pin or dedicated non-updated persistent memory location to decide "version" and then (re)configure devices), or distributed (each driver fends for itself and binds to the correct hardware config).

Use case: For networked products, coordinating updates, eg to fix security vulnerabilities, to similar boards is error prone if need to manage different binaries for what looks like identical products. In Wireless multi-hop networks (Zigbee, Bluetooth Mesh, etc), field distribution of image updates are also "expensive" in terms of networking bandwidth and administrative time. Over time a product manufacturer may produce and sell product with updated board revisions. There is benefit in only having to push out a single binary that adapts to these variations. Examples are component revisions/EOL-replacements (external memory, sensors, ..) or minor HW product functional improvements. Including multiple binaries in a meta-image bundle is un-desireable for most wireless IoT networks, since it increases size of transported blob.

@lmajewski If this is taking issue in different direction than intended, I can raise a separate one

MattCampbellST commented 2 years ago

I'm new to Zephyr, but running into the same issues as outlined here. As @eanderlind pointed out, the issues with managing multiple binaries for minor board difference can be costly and error prone. This is even more an issue with chips shortages having redesigns and/or footprint compatible parts that aren't software compatible a fact of daily life.

I thought I'd describe my use case as well as my workaround so it might help inform this discussion and help others implement their own workarounds.

I have a case where we found a footprint compatible eeprom alternate that has a slightly different I2C interface that isn't software compatible with our primary source. Both chips can use the AT24 eeprom driver, but require different device tree settings. As described above, there is canonical way using the Zephyr device driver model to compile support for both in, but then select which one to use at runtime. I'm coming from a lot of experience with embedded Linux on ARM, but I get that Zephyr is different in that is aims to move the burden of devicetree to compile time to reduce memory and code footprint. That being said, it would be great if there was something in the spirit of devicetree overlays that could be applied at runtime after dynamically probing the hardware to determine the configuration (i.e. read some resistors, an eeprom value, or simply try to 'knock on the door' of a few chips to see if they're there).

As a work around, I ended up putting both eeprom options into the device tree. This causes a warning for having to devices at the same address, but fortunately that can be ignored. I then wrote a new eeprom 'mux' driver that takes the phandles to these two devices. I can use some known differences between these chips to determine which one is present in the 'mux's init, and then hold onto that as the device to use. The rest of the driver is simply implementing a pass through of the eeprom API to that detected device. This has both some pros and cons.

Pros:

It works well enough in this situation
I am able to give the virtual mux driver a label and use that in my code. This way the application code can be ignorant of which of the two eeproms is present.

Cons:

Both instances of the AT24 driver's init functions are run without an option to defer or skip one. This ends up okay for the AT24 driver as it doesn't do much in it's init, but I could image more comple use cases than an eeprom where this could have unwanted side effects.
Writing a custom mux driver everytime this is needed feels a bit heavy handed.
The mux device takes up extra resources and also add function call overhead (granted not too bad, and might just be the cost of abstraction :shrug:)
You end up having extra devices showing up at runtime that might be a bit confusing, especially if the matrix of supported hardware increases.

I could see a solution to this using #19448, although that alone doesn't solve everything. If you could have runtime determined devices disabled by default, but still compiled in, it would be possible to implement some kind of hooks where you could check which devices should be enabled, and then trigger their initialization if detected. It would be great then if you could also update aliases at run time so the application code can use those and be completely unaware of which device ended up getting selected. That might be a stretch though given how much of that is tied up in the C pre-processor at compile time. This isn't a fully baked thought, but thought I'd add it to the mix.

As I said above, I'm new to Zephy, so I might be off on some things, but wanted to drop my $0.02 in here. Happy to follow up more on this if anyone is interested.

zephyrbot commented 9 months ago

Hi @mbolivar-nordic, @erwango, @galak, @tejlmand,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. Please confirm the issue is correctly assigned and re-assign it otherwise.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@lmajewski you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!

eanderlind commented 9 months ago

In case useful in decision making, we also proactively built a custom "virtual device" to handle a DAC that was available in pin-compatible variants with different I2C addresses. Solution along the lines of https://github.com/zephyrproject-rtos/zephyr/pull/50067 but some more persistence in probing addresses. During Covid we couldn't procure enough components of a single variant so wound up with two tapes with the different part variants used for same production batch. Constraining solution to one where the user can provide an I2C "probe_function" that is run during or part of init would still be useful. In our case virtual device retried reading an on-chip hw version register with a short sleep period in-between so that if DAC took a liltle longer to start up, would not fail. I have also used separate approach of having a FLASH address where we store a sw-defined board revision to handle differences, but it requires much more effort in updating a factory programming station and if one makes a coordination mistake there is no way to recover in the field.

zephyrproject-rtos / zephyr

Request for dynamic distinction between different boards' versions (i.e. "single binary" for many boards) #30692