New software configuration and provisioning mechanism

zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.

https://docs.zephyrproject.org

Apache License 2.0

10.93k stars 6.65k forks source link

New software configuration and provisioning mechanism #77638

Open carlescufi opened 2 months ago

carlescufi commented 2 months ago

Introduction

This issue deals with a topic that has been the center of multiple previous discussions and existing issues and PRs. This is an attempt at briefly summarizing the problem in order to solve it in a way that is agreeable to all potential users and developers alike.

IMPORTANT: Please keep the discussion around this particular feature in this GitHub issue, so that we do not have to collect feedback from multiple sources.

Problem description

Zephyr currently lacks a mechanism to configure multi-instance software components at compile-time well above the hardware. Instead, as described multiple times, we have:

Devicetree: Multi-instance, but currently restricted to represent (mostly) actual hardware
Kconfig: Single-instance, mostly restricted to software enablement and configuration

A third mechanism is therefore required. This issue describes the use cases and requirements in order to end up concluding with a final proposal.

Zephyr also lacks a provisioning mechanism, which would be able to generate a pre-compiled "image" of the data that is required in order to provide the required data for certain subsystems to execute.

Use cases

The following use cases have so far been identified:

Network stack interface description: https://github.com/zephyrproject-rtos/zephyr/pull/68127
Configuration of init priorities: https://github.com/zephyrproject-rtos/zephyr/pull/73836
Multi-instance software (non-driver) configuration (e.g. buffer sizes in CHAT/CMUX)

We foresee additional users of this mechanism in the future, but the two use cases above exist today and require a solution.

Requirements (WIP)

Requirements document

Proposed diagrams

@nashif:

@fgrandel:

bjarki-andreasen commented 2 months ago

An additional use case is buffer sizes for instances of CHAT (at command handling), CMUX (uart multiplexing), UARTs etc. A CMUX instance can have multiple channels, where only one needs a huge buffer (for network traffic for example) and others require small buffers for typically AT commands.

ghost commented 2 months ago

Initial Use Cases:

Provisioning a PSA secure storage target (https://github.com/zephyrproject-rtos/zephyr/issues/75275). This needs to be sourced from a secure container at prod time. If we don't want to complicate things at dev and test time then we should integrate this with a simple dev-time config approach for testing.
I also suggest looking at the use cases I collected from community contributions. Ignoring those would mean ignoring those contributions IMO (none of those comes from me, they've all been collected from previous discussions or asking stakeholders on Discord). Few need to be supported initially but any proposal should be aware of them and include a credible migration path.
It was mentioned in the arch WG that the sensing subsystem has an immediate stake in the config subsys.

Sources:

Several people have asked for support of alternative source formats (not now but via a migration path) as input which they generate from internal systems. The requirement here is not that we need multi source at once but it should be clear how we want to migrate there.
UX for first-time users, this ideally includes a single source for single- and multi-instance configuration.
A way to cross-reference DT based on a common data model as the entities are mostly the same (required by both, init and net conf)
A well-defined distinction of Kconfig/DT/sw config that can be reviewed and enforced based on easy-to-observe criteria.
We need a method to map properties in Kconfig, DT and the new config approach for backwards compatibility as many properties will have to be migrated from both DT and Kconfig over time (as in net config).
Overlays (required by init right away and probably net conf, too)

Targets:

type bindings (obvious - just mentioning it for completeness), ideally kept close to usage sites and (as many have suggested) split up based on frequency of updates (maintainability)
Several people have asked for integration with the settings subsystem (and other NVS). Not required right away, but for net config this is on the migration path.
Specific configuration properties should be updatable at runtime (net config is an example).
zero footprint by default (core Zephyr standard for all kinds of config systems)

Documentation:

Several people have asked for self-documenting bindings, ideally integrated with the doc build system.

decsny commented 2 months ago

my comment copy pasted from the Arch WG chat as requested:

"can we be clear that provisioning requirement is obvious when talking about networking IP address but most multi instance configuration needed to develop software is probably not about provisioning, ie buffer sizes and stack sizes of drivers or properties of how parts of the system will act that doesnt change between serial numbers"

and

"i agree with brix about separating the provisioning problem space from the static multi instance configuration problem space"

nordicjm commented 2 months ago

For network configuration, without having run-time configuration, the whole thing can be done in Kconfig, e.g.

# subsys/Kconfig.network_interfaces file:
# total_interfaces would need to come from some sort of DTS processing or macro for counting number of "okay" devices"
if "$(interface)" <= "$(total_interfaces)"

config NETWORK_$(interface)_IP_ADDRESS
    string "IP address of network interface $(interface)"

endif

# Kconfig file in test:
interface = 0
source "subsys/Kconfig.network_interfaces"
interface = 1
source "subsys/Kconfig.network_interfaces"
interface = 2
source "subsys/Kconfig.network_interfaces"
interface = 3
source "subsys/Kconfig.network_interfaces"
interface = 4
source "subsys/Kconfig.network_interfaces"

Where total_interfaces would need to be set in dts directly or using macros to count the number of devices of a certain type

carlescufi commented 2 months ago

Arch WG:

Devicetree vs this new mechanism:
- @henrikbrixandersen mentions that if there's a tie to a struct device when the property is in Devicetree
- @fgrandel explains how Devicetree is currently being used in many instances to configure "software"
@nashif states that we should be able to serialize this in a format that can be used at runtime, independent of the image (i.e. re-provisioning without recompiling)
@nordicjm states that multi-instance configuration could be done with Kconfig, no need for an additional mechanism to configure this
@jfischer-no suggests using CBOR as a serialization language
@henrikbrixandersen states that we may be mixing up different concepts with this: provisioning (serialized data that can be used across multiple instances of a single image) and build-time configuration that is different in every build. @henrikbrixandersen uses the setting subsystem downstream in order to handle this
@fgrandel reminds us of the provisioning use case for Bluetooth Mesh, which is a very common one. Differentiating between provisioning and multi-instance configuration in terms of language and infrastructure may complicate things, so we may want to have the ability to generate either serialized data or just macros that can be used by the code directly.
@nashif talks about the developer experience: language, syntax, tooling can all be shared.
@jfischer-no mentions that anything that resembles a hardware device belongs in Devicetree, and does not require a new mechanism

de-nordic commented 2 months ago

Note on images build for MCUboot: when planning provisioning mechanism, note that it may be hard to generate some hex and add something directly to image (as part of binary), as outcome of building image for MCUboot may have it already signed and encrypted. So in case when some blob is inserted as binary section, where you could directly address constants, that blob would require signing with the image.

Note on using external, out of image, blob: such blob will require careful check, the same level as any user input as it may be used as attack vector on firmware. And what to do when these are tampered with? What to revert to? Note that something that may look like tampering may be just a problem that should not brick device, but rather, with annoying effect for user, default to factory setting - the factory settings that provisioning provides.

I have also a question: how to avoid using RAM for things that always be constant for a device? For example you may not change embedded MAC or serial number of device, yet when accessing these, from what has been written during provisioning, the device will have to load them to RAM, at least temporarily. This also brings cost of enabling Flash API and other subsystems required, for device to get the factory provisioned info, even if every unique instance of a device only reads the info and never attempts to store updates, or chooses other subsystem of storage (like FS).

ghost commented 2 months ago

Thanks @carlescufi for subsuming our discussion. A few corrections to avoid misunderstandings:

@fgrandel explains how Devicetree is currently being used in many instances to configure "software"

Not exactly. I don't use the software/hardware distinction myself. I was rather saying that almost all properties that we introduce change at the same frequency as driver software and not at the same frequency as vendor hardware. From a well-established encapsulation perspective this indicates a requirement to keep driver-specific properties apart from vendor-specific properties and de-facto standardized stable Linux properties (as different files, not as different formats).

I was citing others who had laid out alternative heuristics to define the sw/hw line:

"Everything that ties to a struct device is 'hardware'" as mentioned by @henrikbrixandersen and others. This heuristic breaks down in both directions: 1. By normalization rules, this definition would include almost all of the network interface configuration which is 1-to-1 to drivers. 2. We have a considerable number of DT properties that do not tie to struct device instances.
"DT should contain everything that changes at the frequency of hardware" as @decsny (and myself) have been informed by several community members (but does not find himself being practicable nor do I): This heuristic also breaks down in practice. As pointed out above by that definition ~~almost all~~ many Zephyr-specific properties in DT would be "software".

I pointed out that as long as we do not have an intersubjectively observable heuristic that defines the line between sw and hw, each maintainer will define their own personal criteria as we observe in practice. I disagreed with @nashif that we have only few properties that are contended. I'd rather say that almost all newly introduced properties are contended as @decsny has confirmed based on practical experience over the last few months.

@jfischer-no suggests using CBOR as a serialization language

I mentioned that this is part of the solution space. Others have voted for protobuf or Thrift.

@henrikbrixandersen states that we may be mixing up different concepts with this: provisioning (serialized data that can be used across multiple instances of a single image) and build-time configuration that is different in every build.

I responded, that we cannot cleanly define per property whether they are "provisioning only" or "build only". On a per-app basis this line will change considerably in practice.

Maybe I misunderstood what was proposed, but if it means that we'd introduce another "a priory" (ontological) distinction then I'd strongly oppose such a strategy. Here a few examples others mentioned in earlier discussions:

build-time "hw" property (e.g. gain of ADC channels)
provisioning-time "hw" property (e.g. CDC ECM MAC address)
build-time "sw" property (e.g. buffer sizes)
provisioning-time "sw" property (e.g. IPv6 address)

I could immediately think of products (apps) or use cases (tests, debugging) where those properties would not fit those categories.

I therefore argued that any attempt to bind properties to a specific configuration source or target is conceptually flawed and will not work in practice. The fix is: Define a common abstract data model for everything closely tied by encapsulation rules to software programming models (which includes drivers) based on well-established normalization rules (i.e. shared entities and properties) independently from any specific representation or usage context. This can be done w/o any ontological assumptions with mathematical precision and has therefore been well-established best practice in application configuration for decades. This model then needs to be serializable from/to multiple files on a per use-case and encapsulation basis and/or overlayed at any level.

The settings subsystem is part of the solution space but was introduced to the discussion. That's why I voted against having to depend on the settings subsystem for all configuration as it severly limits extensibility of the config subsystem and introduces unnecessary complexity and resource usage to certain use cases like tests or simple custom applications that otherwise won't need it. I vote for technology that allows us to flexibly merge/split serialization from/to any sources/targets depending on specific application requirements. Several technologies with excellent UX exist for all proposed serialization formats to implement such a merge/split from/to a common intermediate representation.

I mentioned that as a bonus, this will make it easier to switch between alternative source and target representations at dev-time (e.g. for easy reproduction of bugs and debugging with minimal dependencies).

@fgrandel reminds us of the provisioning use case for Bluetooth Mesh

Not only B. Mesh, my comment was including BLE proper as both seem to support the settings subsystem as config target according to @jhedberg. This becomes even more interesting when adding secure storage or large-scale provisioning systems as possible config sources. I also mentioned several other immedately relevant use cases introduced to the discussion by others and that are not yet part of the use case list.

@nashif talks about the developer experience: language, syntax, tooling can all be shared.

I added to this that IMO tooling will be 90% of the maintenance effort and will be mostly shared between all use cases discussed so far (including init and net config). Assuming a common intermediate representation, sourcing from different formats and serializing into different targets can be added flexibly at very low cost. A common intermediate representation will help to debug the configuration space (similarly to .config and merged DTS) no matter whether done in YAML, JSON or DT.

I therefore vote for extracting and re-using existing DT tooling (not format) and re-use it for whatever configuration system and hopefully common intermediate format we decide upon. Any addition to the DT and/or config world will then equally profit both sides, ie. we can expect that DT tooling will benefit from improvements in the config area, too (e.g. UX improvements, validations, bindings, documentation, etc.). This also has obvious advantages from the user perspective as a lot of target-side and tooling knowledge could be re-used, independently of context.

UPDATE: To clarify as I was probably misunderstood here - I didn't argue for any specific tooling solution here although I admit that I can be read that way. I was just arguing to share tooling among different configuration requirements which is in the problem space, not in the implementation space as I'm not saying anything specific about how such shared tooling should look in the end. OTOH let's be realistic: Any shared tooling will need to build on top of what we have but I admit that such a statement is in the solution space and not required right now. So sorry if I was misread in that sense.

As long as those use cases and basic requirements (including those mentioned in my first comment) are not explicitly made part of the RFC I'll find it very hard to agree to any configuration system. They are all immediately relevant to the initial use cases mentioned by the OP and are therefore part of a "minimal set" as required by this issue. And please note that none of those requirements favors DT over YAML (or whatever other solution space technology) in any way.

UPDATE: Edited to ensure feedback given further down has been included in the argument.

nashif commented 2 months ago

I disagreed with @nashif that we have only few properties that are contended. I'd rather say that almost all newly introduced properties are contended as @decsny has confirmed based on practical experience over the last few months.

you need to provide some context and details here, I have no idea what @decsny said and where he said it and in what context.

nashif commented 2 months ago

I therefore vote for extracting and re-using existing DT tooling (not format) and re-use it for whatever configuration system and hopefully common intermediate format we decide upon.

why would you use a tooling for a format if you are not using the format itself? Also, we established that runtime is important, most of the DT tooling is build time and based on macros.

ghost commented 2 months ago

@nordicjm @de-nordic I think your comments are really relevant but they both rather belong to rather specific solution spaces unless we extract abstract requirements from those examples. I find it hard myself to cleanly distinguish between problem and solution w/o loosing sight of feasibility. But it would be nice if we tried to keep very specific implementation problems always linked to more abstract requirements as I understood @carlescufi 's approach.

@nordicjm In more abstract terms you're making the case - if I understand correctly - that almost any source format will be able to represent an abstract data model (including Kconfig) and that we should not introduce additional formats unless required for a good reason (i.e. simplicity and UX as a requirement).

@de-nordic If I understand you correctly you're requiring at a more abstract level that configuration needs not only to integrate with the settings subsystem alone but also with other subsystems (e.g. MCUBoot signing in your case). From a requirements perspective this confirms that we cannot tie the configuration subsystem target to a single subsystem alone. It must be sufficiently open to integrate with whatever target and process we already have or invent in the future.

ghost commented 2 months ago

@nashif

you need to provide some context and details here, I have no idea what @decsny said and where he said it and in what context.

Sure: https://github.com/zephyrproject-rtos/zephyr/issues/76902#issuecomment-2307710931

nordicjm commented 2 months ago

@nordicjm In more abstract terms you're making the case - if I understand correctly - that almost any source format will be able to represent an abstract data model (including Kconfig) and that we should not introduce additional formats unless required for a good reason (i.e. simplicity and UX as a requirement).

No, I'm stating that without runtime configurability (i.e. provisioning and able to change it) the entire network configuration PR is moot because no tooling is required, static configuration can be wholly done in Kconfig, thus the prime feature that this must support is provisioning

jfischer-no commented 2 months ago

@jfischer-no suggests using CBOR as a serialization language

I mentioned that this is part of the solution space. Others have voted for protobuf or Thrift.

Not introducing new dependencies or minimizing dependencies should be a requirement. IIRC it was mentioned somewhere, but I cannot find it here. (Side note, we already use CBOR implementation and IMO it fits better to use it as a DTB alternative).

ghost commented 2 months ago

@nashif

why would you use a tooling for a format if you are not using the format itself? Also, we established that runtime is important, most of the DT tooling is build time and based on macros.

Considerable parts of DT tooling are not specific to DT as a source format (which as such I'm not questioning). Re-using those parts not only means less maintenance cost but also considerable re-use of existing community knowledge.

Just a few examples off of the top of my head w/o trying to be complete:

Bindings are in YAML, not in DT. The binding format, all validation and generator infrastructure around that is re-usable. This tooling could benefit from a common approach as the existing binding approach has several modeling and usability flaws that could at the same time be fixed for everyone. I even suspect that the fact that many seem to find DT "awkward" is at least partly due to bindings rather than DT itself. This alone is a very important investment if we have to re-invent the wheel and maintain it on both sides.
Many low-level generic DT macros are not tied to a specific source format. They only tie to an abstract hierarchical node (aka entity) vs property (aka attribute) distinction that we'll most probably need in the config area, too.
In the config area we'll need a zero-footprint (probably macro) default target, too. Code that generates macros from typed entities and nodes (bindings plus intermediate representation) can be re-used among DT and config.
Nordic and others have built visualization and maintenance tooling around DT that from a structural viewpoint could be (partly) re-used for for any entity-relationship graph including config which would make it easier to learn and maintain tooling.
We need to represent and validate links between DT and config. This is much easier in a common intermediate infrastructure.
I forgot to mention overlay infrastructure and rules. We already have two different overlay systems, do we really need a third one? We could certainly share logic between those systems to make them all easier to maintain and to enforce common rules re what overrides what. I remember that when the hardware model was changed recently there already was quite some redundancy in the change set due to DTS vs. Kconfig overlays.

I'm sure I could find more examples if looking at the code. But I think this is enough to make the point.

nashif commented 2 months ago

@nashif

you need to provide some context and details here, I have no idea what @decsny said and where he said it and in what context.

Sure: #76902 (comment)

@decsny : "From being on the (recently removed, by me) DT binding maintainer area, I saw many things being added to DT that aroused this type of argument over HW/SW distinction."

This is very vague and a general obvervation which might be valid and to the point but this needs to be supported with examples that show whether this is going from bad to worse or something manageable that can be tolerated depending on the use case and context. If someone thinks this is a serious problem that needs attendion, someone needs to open an issue with examples and bring this to the attention of everyone.

decsny commented 2 months ago

I disagreed with @nashif that we have only few properties that are contended. I'd rather say that almost all newly introduced properties are contended as @decsny has confirmed based on practical experience over the last few months.

you need to provide some context and details here, I have no idea what @decsny said and where he said it and in what context.

I think florian is referring to my engagement of him on #76902, he mentioned that I was taking a definition for a software and hardware distinction as for granted, I just responded to him that since I have been on the DT reviewer list for a while, I actually did see that it was quite a gray area and everybody seems to have different opinions.

As far as the point that florian is trying to make, I have not created a compiled list of detailed examples to support his argument, which IMO sounds a bit zealous but I think his spirit is going in the right place regardless, it's an important issue (as everybody interested in this thread would agree I think). And to be clear, I would not say that almost all newly introduced properties are contended, that is a bit of an exaggeration, but there are multiple cases:

First of all, yes, I don't think it's uncommon to see disagreements and change requests on PRs from somebody who thinks something being added to devicetree doesn't belong there or vice versa. I think everybody who has been involved in the zephyr upstream has observed this plenty of times, and it's even discussed almost regularly in the working groups and discord forums as well as github. I do not think this claim needs to be proved to anybody who is a regular here.
On the other hand, I couldn't tell you an exact proportion, but I think a good amount of DT changes and additions are not controversial and rather routine.
- As far as how much we are adhering to Linux I'm not sure and don't know a good way to measure it, but I think a lot of MCU stuff being added to Zephyr is not found in Linux and a lot of bindings being added for platforms added to zephyr are new territory as far as relevant inter-project DT most of the time.
- As far as how much we are respecting the various flavors of hardware vs software distinction, like I said above there can be disagreements, but for the most part, I think most changes and additions to DT are due to drivers being expanded or added to account for more features or aspects of the hardware and enable awareness of them through a more thorough DT description. I think there is a high frequency of changes, as florian pointed out, but this is not necessarily because DT is being mapped 1:1 to software constructs and is being forced to change with the same frequency, but I think most of the cases are just because there is a rapid amount of new hardware enablement development work going on in Zephyr due to it being a growing project with a lot of interest from silicon vendors who are shifting resources to support it. There might be some cases in DT where there is a problem with strong software mapping, and we should avoid it, but I don't think it's this dire as I am being quoted saying.
Finally, there is the case where there is something being changed in DT that is in my opinion questionable, but nobody questions it or contends it. And to be clear, this case is not about my personal interpretation of the hardware config vs software config debate, but I think it's a different issue, where a lot of people (including myself) just don't want to put in the energy to fight against the tidal wave of interest from whoever is adding the thing when there is no alternative to suggest to them anyways since we don't have a good multi instance static software configuration scheme (which is the point of this issue, I suppose).

nashif commented 2 months ago

just a nother note here, can we please STOP talking about implementation and formats and DTS please? thought we had a good discussion yesterday while remaining at a high level and now we are going down the path again talking tooling and formats, this is very disruptive.

decsny commented 2 months ago

just a nother note here, can we please STOP talking about implementation and formats and DTS please? thought we had a good discussion yesterday while remaining at a high level and now we are going down the path again talking tooling and formats, this is very disruptive.

you should quote reply whoever you're talking to with what specifically you think is not relevant.

nashif commented 2 months ago

you should quote reply whoever you're talking to with what specifically you think is not relevant.

too much to quote reply. hint: I am not referring to you, just seen your message :)

decsny commented 2 months ago

you should quote reply whoever you're talking to with what specifically you think is not relevant.

too much to quote reply. hint: I am not referring to you, just seen your message :)

I do think it's not productive to make vague statements about people's discussion being irrelevant, it doesn't help not to be specific, you want to avoid discouraging the participation you actually do want by being too broad with your meta-criticism of the discussion

nashif commented 2 months ago

I do think it's not productive to make vague statements about people's discussion being irrelevant, it doesn't help not to be specific, you want to avoid discouraging the participation you actually do want by being too broad with your meta-criticism of the discussion

where am I being vague?

can we please STOP talking about implementation and formats and DTS please?

I am saying we need to first collect requirements before we go into implementation details and the discussion is shifting toward what was already discussed in the issues that were abrubtly closed. Why at this stage people are "voting" for implementations and formats and tooling without first agreeing on the requirements? Why is this discussion shifting toward how DTS is being used or mis-used?

decsny commented 2 months ago

I do think it's not productive to make vague statements about people's discussion being irrelevant, it doesn't help not to be specific, you want to avoid discouraging the participation you actually do want by being too broad with your meta-criticism of the discussion

where am I being vague?

can we please STOP talking about implementation and formats and DTS please?

I am saying we need to first collect requirements before we go into implementation details and the discussion is shifting toward what was already discussed in the issues that were abrubtly closed. Why at this stage people are "voting" for implementations and formats and tooling without first agreeing on the requirements? Why is this discussion shifting toward how DTS is being used or mis-used?

I'm not disagreeing with you that we need to talk about the requirements before proposing myriads of solutions, but a lot of these requirements are going to come from observations of weaknesses about the current implementation we have, so saying not to talk about it at all is too vague. Clearly there is some direction of the discussion above that you didn't like and I think it would be more useful to be more specific about what direction it went that you didn't like and by who. On the other hand, if by your quoted request you really flatly think that any mention of DT or implementation (especially the current) is irrelevant, then I tend to disagree for the reason I just gave.

nashif commented 2 months ago

I'm not disagreeing with you that we need to talk about the requirements before proposing myriads of solutions, but a lot of these requirements are going to come from observations of weaknesses about the current implementation we have, so saying not to talk about it at all is too vague.

and how does a "vote"" for an implementation in a discussion about requirements and as you say, weaknesses in such implementations help here? We will get there, sooner or later, but when this forum goes into the same rabbit hole about the same topic and the same arguments, most of us will just give up and lose interest.

Some DTS topics being touched on here deservee their own issues and tracks.

ghost commented 2 months ago

I would not say that almost all newly introduced properties are contended, that is a bit of an exaggeration, but there are multiple cases.

Yes, I admit this was an exaggeration. Sorry for that. :-( It's enough that there are many instances to make my argument.

but I think most of the cases are just because there is a rapid amount of new hardware enablement development work going on in Zephyr due to it being a growing project

IMO a sufficiently large part of changes are made due to driver-specific features for existing hardware to support the argument that those properties should be packaged with drivers rather than hardware. This is established encapsulation practice, why deviate? Does not invalidate your argument, though.

I acknowlege that we try hard to make our configuration models independent of specific vendors and Zephyr driver programming models. But for custom out-of-tree drivers or custom hardware this model often won't match project requirements.

just another note here, can we please STOP talking about implementation and formats and DTS please?

I said the same thing above - so why re-iterate? There was nothing deviating being brought to the discussion in the meantime except for your own questions re DT and tooling, right?

I therefore have to say that it's at least partly your fault if the discussion has been digressing again from the problem space. You just got what you asked for, IMO.

Arguing about the software/hardware distinction is in the problem space, focussing on DT as a format is not. Also feasibility cannot be completely excluded from our discussion.

thought we had a good discussion yesterday

I don't agree. Yesterday's discussion brought no new arguments, only repetitions of what had been said earlier (except for the truly original proposal to forget about config altogether and do everything in Kconfig). It's a pity that so many contributions were ignored by most participants. We need to do our homework before we attend arch WG meetings.

There are many who have participated in the meta-discussion but have not provided much own substantial contributions or comments on arguments that are already on the table.

too much to quote reply

IMO this amounts to weaseling. Everybody might implicitly feel indirectly accused unless you make this specific.

This is very vague and a general obvervation which might be valid and to the point but this needs to be supported with examples that show whether this is going from bad to worse

I personally find this comment a bit unfair. Many others (including myself) have made this point over and over including specific examples. @decsny's comment was embedded in an RFC that pointed to specific examples and only confirmed the observations made there. And his new comment is equally balanced and well argued.

But here some specific examples then as you asked for those:

I pointed to the USB maintenance area yesterday to make the point.
I can also point to my own maintenance area. IEEE 802.15.4 configuration as I found it established when I arrived has always made a very liberal interpretation of the hw/sw split, too. We even got a distinction between ieee802154 and ieee802154g accepted upstream some years ago without anyone mentioning its ambiguity.
You'll hopefully not deny that in the past there have been many PRs where @gmarull repeatedly mentioned questionable use of the hw/sw distinction - and I totally agree with him on all accounts if one takes DTSpec's OS independency requirements seriously. I'm sure @gmarull will easily be able to name a few of those examples right away if challenged.
It is documented practice in Zephyr to mix multi instance software configuration into the hardware container.
Update: I can provide many more examples on demand after reviewing existing bindings.

If someone thinks this is a serious problem that needs attendion, someone needs to open an issue with examples and bring this to the attention of everyone.

There are at least two PRs and one RFC where this argument has been brought forward in the context of this topic alone not counting all the unrelated PRs where the hardware/software line has been questioned by commenters in other contexts.

@nashif It would also be nice to get your personal requirements list or at least some confirmation of existing requirements that others have mentioned. This would add tangible content to the discussions rather than just focussing on the meta-discussion alone (which I otherwise find worthwile, too btw, but not exclusively so).

Edit: Updated to the latest state of the discussion include arguments given by others.

decsny commented 2 months ago

I'm not disagreeing with you that we need to talk about the requirements before proposing myriads of solutions, but a lot of these requirements are going to come from observations of weaknesses about the current implementation we have, so saying not to talk about it at all is too vague.

and how does a "vote"" for an implementation in a discussion about requirements and as you say, weaknesses in such implementations help here? We will get there, sooner or later, but when this forum goes into the same rabbit hole about the same topic and the same arguments, most of us will just give up and lose interest.

Some DTS topics being touched on here deservee their own issues and tracks.

I think we are having a misunderstanding here, I am not saying that I think we should be voting on implementations or even proposing them, what I'm saying is that observation of the current implementation is going to be an important source of deriving requirements since most of the motivation for talking about this issue seems like it stems from the limitations people are discovering with the current state of things. That's why I asked you to clarify what you were talking about, since your meta-criticism sounded to broad to me since I personally think discussion of the current implementation is relevant to requirement gathering, but your brief request to filter discussion if taken completely as worded would exclude discussion of that. As for what anybody else in this thread is talking about, I'm not defending the direction anybody is taking, I honestly only skimmed what's going on because I got pinged like 5 times, and it sounded like I needed to clarify my quotation.

edit: late proofread reword

nashif commented 2 months ago

I personally find this comment a bit unfair - at least targeting @decsny specifically.

@decsny do you really feel targetted here? I emphasized that all of this might be true and valid, but we need specific examples and we need to discuss this on a different track if it is really that bad.

@nashif You haven't given any specific examples either to support your own argument, right?

right, and I will not provide examples because I am not making any claims or arguments, you are the one making them.

ghost commented 2 months ago

most of us will just give up

Before most of us give up they'll have to make their first substantial contribution or at least refer explicitly to what others have brought to the table. ;-) We have a lack of people listening to each others' argument and a lack of original content.

If you think others are contributing the wrong things to the discussion why don't you simply say the right things rather than making such a fuss here.

decsny commented 2 months ago

@decsny do you really feel targetted here? I emphasized that all of this might be true and valid, but we need specific examples and we need to discuss this on a different track if it is really that bad.

no, i don't feel targeted, from the comment he's referring to that was before I even came to this thread, and you're not wrong that specific examples are needed. Which is why in my original response to all the pings I specifically clarified in response to the claim that I knew about all this that I do not have a compiled list of examples, that would take time and effort which I have not had the luxury to spend. And I want to be clear also based on some comments in this thread that I personally don't desire to be the vehicle for an appeal to authority about what's being discussed here and don't completely see what value it adds anyways, since another point of what I was trying to say above is that those experiences I've seen a lot from being a reviewer, I would expect should also to have probably been commonly seen also by everybody else interested in this issue.

ghost commented 2 months ago

Now that we all seem to converge on a common problem statement (pending a few more examples I trust will be easy to find) we can hopefully see what the essence of this back and forth would be wrt the concern of this issue.

Requirements:

We have to make any conceptual distinction between different "a priori" (=application/use case independent) config containers (Kconfig vs. DT vs. X) so precise that they could ideally be enforced mostly automatically to avoid further conflict and confusion.
The more such "a priori" containers we propose the more it is important to make those distinctions objectively observable, otherwise users will find it very hard to implement and understand those concepts in practice. Imprecise distinctions and overlap in the config area have cost us as a community and even more our users too much energy already - as was again exemplified by all those comments here re sw vs hw and Kconfig vs DT.

What I mean by precise has been explained in this comment, see the definition of "axiomatic" vs. "ontological": Our configuration space must be accompanied by easy-to-observe (axiomatic) heuristics that refer to intrinsic properties of our software rather than relying on extrinsic "real world" concepts that cannot be mapped to code artifacts without contextual interpretations that we will by definition (and backed by a lot of research) never be able to agree upon objectively.

I've shown that the distinctions and concepts I'm proposing (feature selection vs. configuration, normalization, encapsulation) can be mostly automated or at the very least be defined with mathematical precision. The same can not be said about distinctions like hw vs sw or provisioning vs build time properties. At least not until someone has come up with a heuristic that really represents what we actually do. Note that I've shown that heuristics that have been proposed so far are quite obviously not good enough.

I'm of course open to being proved wrong or convinced otherwise. But certainly not by "hand wavy" concepts that no one has been able to define precisely so far.

ghost commented 2 months ago

~~I personally don't desire to be the vehicle for an appeal to authority~~

~~It's funny that you're saying this because you've made it credible that you are just one of those with authority. Citing you once more:~~

~~Actually, I would say as the only active person on the DT collaborator list for the last couple months, it's probably more apparent to me than most people that the gray area is an issue.~~

Update: Perceived as out-of-context citation (which was of course not my intent - really sorry for that). Therefore retreated.

But I agree with you that no one should hide behind anyone else. I hope this is quite obvious from the way I'm defending my argument against strong opposition although I would certainly have many, many reasons by now to give up. Much more than others who didn't even join the party yet.

In any case I find your argument constructive, balanced and differentiated and fully agree to it (including where I stand corrected). That's why I back it.

decsny commented 2 months ago

I personally don't desire to be the vehicle for an appeal to authority

It's funny that you're saying this because you've made it credible that you are just one of those with authority. Citing you once more:

Actually, I would say as the only active person on the DT collaborator list for the last couple months, it's probably more apparent to me than most people that the gray area is an issue.

But I agree with you that no one should hide behind anyone else. I hope this is quite obvious from the way I'm defending my argument against strong opposition although I would certainly have many, many reasons by now to give up. Much more than others who didn't even join the party yet.

The context of what you're quoting clearly was just to say that I in particular have been privy to a lot of instances where people had different opinions and it's clear to me that the definitions are gray, which you were in that thread claiming that I was not cognizant of when I asked you some questions. I don't see how my clarification that I am on the DT collaborator list and do in fact recognize it's current usage is not completely ideal (which I think at this point is not a revolutionary position for any community member), is grounds for people to use my name as a fire blanket to throw over any chaos in every other discussion regarding DT that they get themselves into as somehow a major point in some bombastic blather of bafflegab.

benediktibk commented 2 months ago

I would like to throw in some constructive facts, and less meta discussion, to get this thing forward. As I see it, there are several possible settings which one might set to a certain value, either during build- or runtime:

clock divider
alternate function
GPIO direction
flash mapping into address space
stack size for a driver thread
buffer size for a driver
ethernet MAC address
IP setup (DHCP vs static)
purely application specific

I have ordered them already, according to my experience, from lower to higher probability that they have to be set during runtime. From this I can make a few observations:

It is hard to define the exact line where hardware ends and software starts.
Some of them are single instance, some are multi instance settings.
For Zephyr, as an application framework to be useful, it is necessary to provide a way to configure all of them during runtime.
It is hard to make a distinction where configuration ends and provisioning starts. So far I have explicitly not used these terms, I've only used settings.

So if we want to stay in the status quo (which is IMHO important to the discussion, otherwise we wouldn't have one) we will have to make a precise definition where we draw the line between hardware and software, as well as between configuration and provisioning. Which I think will be hard, and actually not very useful for the endusers, contributors and maintainers.

Or we deviate from these distinctions and move forward to an approach which doesn't require them at all. This could be for example a description of all these settings which are defined during build time and are then used during runtime as the defaults during startup and initialization. If there is a need to change them during runtime we will have to provide functions to do so, but I do not see the hard necessity to find a tool which solves both of these problems at once. We could for the provisioning part implement a framework for what @henrikbrixandersen uses downstream, based upon the settings subsystem. But for this specific issue, the configuration of multiple instances, I would leave this out of scope.

Therefore I tend to vote for a single format (not necessarily file) which configures multiple instances of software and hardware, as this will make it easier to understand what is going on. I still think there will be a place for Kconfig, but basically reduced to its original intent, the feature selection. I do not see IP addresses and stack sizes as good candidates for Kconfig options, they are rather configurations.

jfischer-no commented 2 months ago

thought we had a good discussion yesterday

I don't agree. Yesterday's discussion was almost entirely redundant to what had been discussed online before (except for the truly original proposal to forget about config altogether and do everything in Kconfig). This was only necessary because a large part of the audience was not aware of arguments made offline (as we can again see in this discussion). None of what was said here had not been said previously elsewhere. It's a pity that so many comments are ignored. Plus we had to intervene twice to re-focus yesterday's discussion on the problem space.

I overlooked the fact that this is only about requirements, sorry. But it is hard for me to separate them here. Mostly here and in other discussions it is about DT vs. X. Count how often DT (devicetree) and Kconfig appear in your comments. After reading most of the comments here, I think we should go back to a more precise description of where devicetree can be used, including examples of acceptable gray areas. (Also, some guidelines on why the Kconfig option should not be used, e.g. in parts instantiated by C macros or devicetree.) Sampling requirements for the new configuration system without knowing exactly where others may be used is a step too far.

I also have to say: This whole discussion thread would be almost 100% unnecessary if requirements mentioned in other contexts would have been collected in the RFC in the first place before opening it for discussion. I'm forced to repeating everything here that I had already stated elsewhere w/o exception as others, too.

There are many comments from you that lack conciseness and are bloated. This takes more time to read and analyze. Please keep your requirements comments concise, and try not to take other people's statements out of context. I am confident that they will get more attention.

decsny commented 2 months ago

yes I also agree @fgrandel to be clear I think you can help the project as a valuable force for changing things in the right direction with your energy and systematic thinking but please take into consideration like johann said that the communication is the oil that lets the gears turn in the project

I hope this is quite obvious from the way I'm defending my argument against strong opposition although I would certainly have many, many reasons by now to give up. Much more than others who didn't even join the party yet.

I don't think there is as much opposition to you as you are perceiving, it's just you are a lot of steps ahead of everybody all the time and there's a lot of people that are involved in this discussion, we all have to be patient. This has been an issue for years and I have never seen so much community momentum about solving it as there is now, so this is the right time for you to involve yourself in the discussion about it, you just can't expect it to be solved immediately

ghost commented 2 months ago

There are many comments from you that lack conciseness and are bloated.

Thanks @jfischer-no and @decsny for contributing to the meta-discussion. I'm happy that we take the time to lead it, because I was the first requesting to focus on the problem space and also questioning our debating culture. And of course I'm open to feedback here, too.

We agree on one point: Our debating culture is often overly emotional, therefore often unnecessarily tiring and even if not so emotional then often very, very inefficient. This discussion is a rather representative example I'd say.

My problem is: I cannot disagree more about your definition of "conciseness", no matter how many thumbs appear under that comment. I'm aware that my communication style differs considerably from that of most in this community but it differs because I want it to differ. I fiercly resist the increasingly acceptable tendency to ignore any text that is longer than three lines because we've unlearned to concentrate on a line of thought for longer than five minutes. I also fiercly resist the tendency to speak before listening or thinking.

If anyone's concentration span has already been exhausted by these two paragraphs then, yes, we'll not be able to communicate because I am writing for an audience that I expect to routinely read specifications, white papers or datasheets with a few hundred or even thousands of pages patiently and not consider them "bloated". So I'm going to write provocatively long again (from your perspective) because it's my personal style and I expect respect for my style. Adaptation is required from both sides, so you can adapt to my style - as I usually do to yours, too.

Most comments here are short but redundant, contribute nothing original but shallow assumptions and emotions and clearly show that the person commenting has not taken the time to read respectfully and refer to what others had contributed before. I can open whatever Zephyr discussion thread and there'll be a high probability that I'll observe the same there, too. This is what is really detrimental to our efficiency of communication.

My comments usually take several hours to write so, yes, I do expect you to take a few minutes to follow along and digest, too.

If you all had taken the time to read and truly understand these two dense and conscise comments maybe asking back a few times to clarify your understanding I swear you'd then not have heard me speak up a single more time because everything I was forced to say since was 100% redundant (including this statement itself).

For more background and a more thorough explanation you can still read my RFC if you really care, just leave out the whole (now obsolete) DT/CT part and concentrate on the requirements sections. I know of at least three community members who understood it immediately so it is doable. But it certainly takes more than five minutes and rightfully so, because it took me over a week to interview everybody, think hard about what they have said and then come up with and think through a compromise that takes into account all requirements I had heard so far. It is all but perfect but it is certainly much better quality, more dense and less bloated than this discussion.

I also suspect that some seem to confuse "bloated" with "he seems to be talking about things I don't understand so, TLDR". I'm aware that most in this community do probably not have a background in epistemology, have probably not much practice in applying Conway's law or have probably not done much event storming to transform ontologies into shared language. Trust me I know exactly what I'm talking about. If you think this is bloated then you probably didn't understand.

To be honest: If I find a comment with words in it I don't understand I'd Google them, ask ChatGPT or ask the person commenting. I would certainly be patiently explaining in detail if someone cared to ask. If you think I'm arrogant I can ensure you that I had to read up on concepts I learnt from this community for many months non-stop - so certainly this means giving and taking. The reason why I might talk in a different language is because I have a strong background in humanities and application development. This is called diversity. And I love diversity because it is really healthy to a community. Unfortunately I don't find this community to be very diverse at times. And the way I'm getting feedback here confirms that a bit more practice in accepting diversity could do no harm to this community either.

So as long as I can see that most here have not really understood what I'm talking about I'd say: Be patient with yourself rather than asking for patience from me. Take your time to read up on concepts and others' arguments, think thoroughly about what you have read before you write again - even if it takes a few hours or days. And then feel free to ask if something is still not clear. In the best case I'll see immediately that I misunderstood something or was not precise and learn something.

Here are my desiderata for a more efficient debating culture (also cited from a prior contribution):

All comments must either strive to establish community requirements or refer to requirements brought forward by others before. (About half of the comments in this discussion would be filtered by this rule - but I can truly see improvement here.)
All suggested design improvements shall refer to well-established state-of-the-art design criteria including keywords that empower everyone to read up on them if they are not aware of them. (At least three fourths of the comments do not comply and so far I cannot see much questions being asked either, so I suppose this is due to a lack of common awareness of sota design methods from the application development space?),
Comments must show that the author is fully aware of the RFC to be decided upon and all arguments that were brought forward previously in the same context and explicitly refer to those if they contradict own proposals. (This kills almost every contribution made in this discussion.)

If we'd just followed these three simple rules then this discussion would not have to exist at all - not even in a more concise form. I'd probably win the bet that my initial RFC used less words and contains double the content of this whole discussion.

I ask everyone to do what I've done all the time: Make your own list of what others have said, think about it until you can make a proposal that represents a good compromise to your very best knowledge of all stakes that have been brought forward and only then comment (paying attention to explain your specific choice of compromise to those whose stake you'll not be able to fully represent). Then this goes through a few rounds of clarifications and cleaning up of misunderstandings and then we have a result. You'll see that then you'll be exactly as many steps ahead as I'm now because this is all I did.

If you all had done the same then not only would it have been much easier for you to follow along but also you would not have been such a big waste of my time. Sorry to say it so clearly but this is what it looks like from my side.

try not to take other people's statements out of context

As everyone I sometimes misunderstand people, and I'm sorry for it - and I've always said so when it happened. I strongly disagree with the assumption being made here that I consciously cited people out of context. AFAICS I was the only one to admit and apologize for misunderstandings so far because I truly care.

When I cite a lot of people then this is only because I've listened and I want to make sure that everyone's stake is being heard - even when they are not present. Those who were present in the last arch WG will have noticed that I didn't say a single word about my own requirements but I only ensured that requirements by those community members who were not present in person would be heard. And the same is true for this whole discussion - don't forget, I'm still just helping out Jukka to get a solution to his problem as he's the only other one who's really invested a lot into this topic so far. I have no personal stake at all but my promis to Jukka that I'd find a more acceptable solution once I put a red flag on his PR (which is still there for a reason). I call this responsability.

Once you think as much about my concerns as I've thought about yours, you can come back and cite me, too. I'll make sure to correct you patiently if I'm misunderstood as I've done with Carles summary above who certainly did not cite me wrong consciously either, I'm 100% sure.

(Overall measured reading time of this bloated contribution: 10 minutes of your lifetime - took me 3 hrs of mine to write it.)

Laczen commented 2 months ago

IMHO the separation between configuring struct devices and other would be a good separation between devicetree configuration and other (yaml based) configuration.

I would say there are 2 reasons to configure struct devices:

For multi-instance support,
To allow reuse of driver code where only a limited set of parameters are changed,

These 2 reasons have not always been followed, but it is possible to correct this. We should however try to avoid anyone from using devicetree for configuration by making whatever needs configuration into a struct device. Some cases will remain border case (e.g. reserving some hardware region to store a network MAC), but these should be accepted.

The proposed yaml "configuration" can also be considered as provisioning instead of configuration, and we already have a system for provisioning: the settings subsystem. Now is this yaml configuration incompatible with the settings subsystem ? Not at all. The yaml configuration can be used to generate an in image provisioning solution, and using it as a settings based configuration can be done as a later step in the network config. There might be some misconceptions about the settings system:

It requires persistent storage: this is not true. Whatever data that is available can be injected into a subsystem, this can come from external "storage" (as cbor, nanopb, thrift, ...), internal persistent storage, but also from in image data. The routine settings_runtime_set(key, data) can be used to inject data into a subsystem, and this doesn't need any persistent storage.
The settings system is a key-value storage: this is not true. The key that is used is only to help settings handlers to set/get the correct data of a subsystem.

The settings subsystem can be configured to use multiple sources of settings that can be added and removed as needed. There can be only one destination to store the settings, but also this destination can be changed when required. The settings subsystem can also take some (sometimes hard to handle) work from you. E.g. disable a subsystem while configuration changes are being applied and reenabling when the all changes are done (disable when set is called, enable when commit is called).

The proposed yaml configuration method can also be used as a base to generate a set of factory defaults that are stored in a separate flash region and even configure where this flash region is to be found/written.

The yaml configuration method as provided in #68127 is a good first step to allow easier configuration and it is a versatile solution that does not break any other configuration/provisioning methods that zephyr already has. It is an enabler for more items that are missing configuration or have been (wrongly) added to devicetree.

benediktibk commented 2 months ago

I like the approaches and thoughts which @Laczen put into how the settings subsystem can be used for combining the configuration and provision approach. But, I think we still have a big issue to solve:

We should however try to avoid anyone from using devicetree for configuration by making whatever needs configuration into a struct device. Some cases will remain border case (e.g. reserving some hardware region to store a network MAC), but these should be accepted.

We still lack a clear definition of what is hardware and what is software. Exactly these "some cases" which you mention are very problematic, and not even that rare. Therefore, if we would like to keep hardware and software settings separated we will have to come up with a precise definition. If we are not able to find one we should avoid this separation, as it will be only the cause for countless other discussions and debates in the future. Which means time and effort, which we can invest into something more productive and useful.

ghost commented 2 months ago

@Laczen, @henrikbrixandersen It would be nice if you could position yourself wrt to the argument given above as yours seems to contradict that earlier argument:

"Everything that ties to a struct device is 'hardware'" as mentioned by @henrikbrixandersen and others. This heuristic breaks down in both directions: 1. By normalization rules, this definition would include almost all of the network interface configuration which is 1-to-1 to drivers. 2. We have a considerable number of DT properties that do not tie to struct device instances.

Do you think this argument is false? If so: Why? What do you have to say to those community members who have brought forward the alternative approach:

"DT should contain everything that changes at the frequency of hardware" as @decsny (and myself) have been informed by several community members (but does not find himself being practicable nor do I): This heuristic also breaks down in practice. As pointed out above by that definition ~~almost all~~ many Zephyr-specific properties in DT would be "software" [because they change at the same rythm as new features are being introduced to drivers not when hardware is being added or updated].

How do you position yourself wrt the normalization argument brought forward by myself?

Do you believe that these arguments are irrelevant? If so: Why?

If we want to converge towards a solution that is acceptable for everyone, then we need to respect this rule:

Comments must show that the author is fully aware of the RFC to be decided upon and all arguments that were brought forward previously in the same context and explicitly refer to those if they contradict own proposals.

Can you please add to or update your comment to make it more useful? Otherwise we'll continue to block each other w/o converging towards a common solution - thereby wasting each others' time. Additionally: Referring to others' arguments if proposing something contradicting is a basic requirement of politeness and respect, too.

ghost commented 2 months ago

@henrikbrixandersen @Laczen Can you please also position yourselves to the arguments given above that properties cannot a priori be assigned to either the settings subsystem or build-time properties (that must be available even if the settings subsystem is not even present in a firmware):

Here a few examples others mentioned in earlier discussions:

build-time "hw" property (e.g. gain of ADC channels) provisioning-time "hw" property (e.g. CDC ECM MAC address) build-time "sw" property (e.g. buffer sizes) provisioning-time "sw" property (e.g. IPv6 address) I could immediately think of products (apps) or use cases (tests, debugging) where those properties would not fit those categories.

I therefore argued that any attempt to bind properties to a specific configuration source or target is conceptually flawed and will not work in practice.

And then fleshed out with more examples here

I would like to throw in some constructive facts, and less meta discussion, to get this thing forward. As I see it, there are several possible settings which one might set to a certain value, either during build- or runtime:

clock divider alternate function GPIO direction flash mapping into address space stack size for a driver thread buffer size for a driver ethernet MAC address IP setup (DHCP vs static) purely application specific

I think no one seriously questioned, that the settings subsystem needs to be one target of configuration data, so this is no news. A simple me-too would have been sufficient.

And again: Respect for others' arguments IMO means that you need to explain to others why their argument was maybe misconceived or wrong, so that they have a chance to update their thinking and agree to you. Otherwise we end up in an endless edit war.

Laczen commented 2 months ago

@benediktibk, @fgrandel,

In my comment there was no clear definition of what hardware is and what is not, because this definition is not needed. What I'm saying is that devicetree configuration is limited to the construction of struct devices. The struct device creates a kernel object and has a specified layout, these come with limitations (userspace, api, ...) and whatever is created like a struct device needs to adhere to these limitations. For the creation of struct devices dts can be used as a configuration tool.

When using devicetree for the configuration of struct devices also software items are allowed to be created and configured by devicetree. The discussion on whether or not something is added as a struct device is moved into a discussion on whether this representation as a struct device is efficient or not. And this is the correct discussion.

In the present situation there are already exceptions in devicetree. We should identify them and decide if they will remain as exceptions or they should be changed into a struct device definition. Both methods are acceptable, but a clear path for the near future is set. Are future exceptions allowed? My cristal ball is failing on this.

When we look further in the future the yaml configuration system could be used to move any configuration items from dts and limit dts to a pure hardware definition. Whether this is wanted/needed/... is not something I'm taking a position on.

@fgrandel, as should be clear from the above there is no need for me to take any position. Both configuration from dts and yaml can be accepted, but for dts they are limited to the construction of struct devices. If someone wants to configure something that is not a struct device yaml is the way to go.

benediktibk commented 2 months ago

The discussion on whether or not something is added as a struct device is moved into a discussion on whether this representation as a struct device is efficient or not. And this is the correct discussion.

Could you please clarify what you consider as efficient in this context? Are we talking flash footprint, runtime, maintainability, ... ?

benediktibk commented 2 months ago

@Laczen I will try to summarize what you are aiming for, just for the purpose of checking if I understood everything correctly. Please correct me, if I am wrong.

You intent to have two separate configuration systems/sources. One is the devicetree, from which the struct devices are constructed. The other one is the yaml, which will be used for software items.

From this I conclude, that there will be a separation between hardware and software. Which brings me back to my original question: Where is the line? I'm sorry to be so nitpicky on this, but if we intent to use such a system we will also have to come up with a definition for this separation. And yes, this definition is urgently needed in such a system, as there will be otherwise lots of confusion and discussion about this topic. Therefore, as long as we cannot come up with a precise and practical definition for this separation we won't be able adapt this architecture which you have proposed.

tbursztyka commented 2 months ago

I agree with @benediktibk about having a properly defined limit in what belongs to hardware space versus what belongs to software space.

In a more general way: I think we need to step back and clearly define all the scopes of what is being addressed here. If not, we will end up arguing forever on technical details which - at this stage - do not belong here, imo.

Couple of questions:

What are the domains this will be targeting? I see 3 at least: device drivers, subsystems and user applications. We definitely want to solve the 2 firsts in a generic way, do we want to solve the 3rd as well?
Built-time vs Run-time? Either because of hw constraints and/or because of the targeted usage, we will need to evaluate both being supported and in a flexible way (switching from one type to another). #68127 has been done that way - at built time - to solve complex network configuration for test cases, obviously there are field cases which will need runtime network configuration.
What do we want to do with configuration options in Kconfig when these are about specific settings? (a subsystem thread priority, a subsystem thread stack size, ...). Do we want to leave them there or throw them away into the whatever-new-solution-we-will-have?

I do not think we can properly answer these without getting more use cases, listing their requirements and cross-checking them in order to have a better overview.

We could go stakeholder by stakeholder:

device drivers? answering the very first question above would already solve most of this issue there
networking? this stack is complex enough to have various aspects such as the stack itself, the network interfaces, the sub-domains such as WiFi, canbus, 15.4, etc...
Bluetooth?
storage?
...
any subsystem that would take advantage of such configuration mechanism.
...

ghost commented 2 months ago

@Laczen

In my comment there was no clear definition of what hardware is and what is not, because this definition is not needed.

I agree that what we're looking for is a good heuristic that precisely defines what goes where based intrinsically on our software, not on something external. I agree, that the heuristic you're proposing is such an intrinsic heuristic. Unfortunately this specific heuristic does not work as has been pointed out several times above.

Meta: So yes, you do contradict what was said before w/o explaining how you want to cover the weaknesses that your argument re-introduces. As you seem to have difficulties to derive this from what has already been said yourself, I'll try to re-explain, although this is 100% redundant and produces the kind of bloat, that we can and should avoid.

What I'm saying is that devicetree configuration is limited to the construction of struct devices.

No, it is not - and that's exactly the problem. DT contains both, more and less, than the configuration of abstract struct devices and there is no way we're going to change this in practice:

More: Interrupts, CPUs, clocks, memory, flash, intermediate helper nodes, etc. (to name just a few). As Linux (and DTSpec) explicitly require those nodes inside DT we can not move them elsewhere.
Less: Almost all properties from the network configuration belong 1:1 to struct device (api and net_if have the same abstract 1:1 relational status wrt struct device). Please note that the config user API must not have access to internal implementation details (such as the somewhat strangely defined "reverse" rather than polymorphic net_if->dev relation). Based on the rough heuristic you propose all those iface properties would go inside DT, too. I assume that this is not what you intend?

Any precise user API heuristic that is orthogonal to normalized 1:1 relations (ie DT vs. X) needs to distinguish on software module (aka folder/file) level, not on entity level - this fundamentally excludes any struct whatever approach.

In the present situation there are already exceptions in devicetree. We should identify them and decide if they will remain as exceptions or they should be changed into a struct device definition.

This is a recursive definition: You introduce a definition that contains undefined exceptions which you want to define "later" based on what? As can be seen above, those "exceptions" are too important to ignore them now: They are the OP's very problem statement. So as long as you cannot say how precisely we are going to deal with them, your proposal simply reproduces the unsatisfying status quo and I therefore cannot see its merit.

I proposed alternative heuristics in my RFC, which do not have these weaknesses but you seem not to be aware of - please read my next comment as an intro and the full proposal then and position yourself with regard to it.

Meta: So yes, surely you should have read and understood (and if not understood: asked back) first to be able to position yourself accordingly w/o forcing others to waste their time and produce bloated redundancy that makes it hard for everyone to follow the discussion.

ghost commented 2 months ago

@tbursztyka

I do not think we can properly answer these without getting more use cases, listing their requirements and cross-checking them in order to have a better overview.

Agreed. Are you aware of the use cases in my RFC? I think they cover at least all use cases that have been brought forward by community members so far. Many more than the OPs initial list.

We could go stakeholder by stakeholder:

I agree that this would be a possible solution. But I still find it rather arbitrary and way too complicated. I'd very much prefer a single heuristic that:

can be applied w/o having to discuss and define separate scopes for every stake, use case and subsystem separately,
is way simpler to document, enforce automatically at build-time and understand (as no one will read the docs anyway),
can be applied to future subsystems w/o having to load the arch WG with a decision each time.

The heuristics based on feature selection (Kconfg vs. DT/X) and normalization/encapsulation (DT vs. X) combined with a common intermediate representation model are precise but simple rules. In my RFC I used two paragraphs to define both. They extend to all current and future software modules (including, but not limited to drivers and subsystems) while at the same time being 100% backwards compatible and providing us with a soft migration path. The definition of DT vs. X comes down to defining per top-level folder what is used where (tentatively arch, soc, boards + drivers = DT, all the rest X). Plus I already did the work of successfully validating it against all currently known stakeholder needs and use cases.

In this context we should also be aware that @gmarull had brought forward a similar argument already:

IMHO, we should probably start solving domain-specific problems, and if some can be generalized later, just do it. In the end, on a Linux box you have to deal with many different config files, not always using the same syntax.

and I think part of my response is also worth mentioning in this context:

[Other than Linux, we are an integrated application development platform not a generic OS and therefore from a UX perspective] we have a clear requirement to design something that can be extended to other subsystems plus can be integrated with the settings subsys, used for provisioning and be serialized to other formats like [cbor, ] protobuf IDL or Thrift.

This comment is now one month old which shows how much our discussion is still turning in circles.

My answers to your questions then would be...

What are the domains this will be targeting? I see 3 at least: device drivers, subsystems and user applications. We definitely want to solve the 2 firsts in a generic way, do we want to solve the 3rd as well?

We have at least four source-side config domains (my proposal how to cover them in brackets): feature selection (Kconfig), drivers and hardware description (DT), other software modules including subsystems (X) and applications (X). I don't see any reason why custom app config should be done separately. Three different approaches is already too much from a UX perspective and us using them can only be justified "historically" IMO.

Built-time vs Run-time? Either because of hw constraints and/or because of the targeted usage, we will need to evaluate both being supported and in a flexible way.

Please note that we have more distinct target-side domains beyond build-time vs. provisioning vs. runtime (see earlier comments).

Apart from that: 100% agreed. This is why I require a common intermediate normalized representation (the exact format of which is irrelevant to the problem space and can be decided upon independently). This decouples source and target serializations and allows us to validate cross-references between DT and X at build time. A normalized (source independent) target split could be defined on this intermediate representation per default and overlayed per application by an appropriate path language, e.g. XPath, JsonPath - also applicable to YAML - or the "lopper" path language. I'm only mentioning those solution space technologies to demonstrate feasibility, not implying any specific preference of mine. Such an architecture starts very small but scales easily to all currently known use cases later plus it provides us with a migration path and protects a maximum of existing investment (ie lowest short and long term maintenance cost).

I don't agree that #68127 implements this minimal requirement. And this is only one of several minimal requirements mentioned in this thread that are not met by that proposal although it will not be difficult to evolve it until it does.

What do we want to do with configuration options in Kconfig when these are about specific settings?

Initially we can leave them untouched. Whenever we feel that we need to migrate one, we map it to the intermediate data model. On the target side we only need to find & replace one macro by another. This is what I mean by a "soft migration path". The same applies to migrating DT properties to X - especially if X provides a similar-enough macro infrastructure as DT (ideally the same - just with updated prefixes). The migration can be fully automated in both cases then. The mapping ensures long-term backwards compat even if users still rely on older Kconfig/DT properties in their apps. If X overrides Kconfig and DT by default then we get the best of all worlds IMO.

Laczen commented 2 months ago

@benediktibk, @tbursztyka, I am carefully avoiding any use of software/hardware because this can lead to difference in understanding. There are drivers that are a pure software representation of some hardware (e.g. all simulator devices), is this software (space), is this hardware (space) ? Is an fpga with uploadable image software/hardware? Is a driver that emulates flash over an IPC channel software/hardware?

Stating "devicetree is used to configure struct devices" does not imply anything (software/hardware) on what will be struct devices and what not. Trying to define what can be a struct device (driver vs subsystem) is something that I'm not trying to do, it is not needed (although it would help to avoid dissussions). There will always be a grey area where something could fit into both and there is no wrong selection in that case.

Stating "devicetree is used to configure struct devices" does not imply that anything that has a structure that resembles a struct device should be configured trough devicetree.

benediktibk commented 2 months ago

@Laczen If I understood it correctly you are saying it is not necessary to separate between hardware and software, it is only necessary to define what should be in devicetree or X (possible yaml)?

If that is the case you won't have to define the difference between hardware and software. Which is hard, as you mentioned yourself. But this only shifts the problem: How do you then define what should be in devicetree or in X?

We should definitely avoid grey areas and try to make it as clear as possible to avoid useless lengthy discussion in the future.

tbursztyka commented 2 months ago

@benediktibk, @tbursztyka, I am carefully avoiding any use of software/hardware because this can lead to difference in understanding. There are drivers that are a pure software representation of some hardware

Ok I see your point, perhaps we need to stick to "devices" instead of hardware, in order to be semantically right. Emulated/simulated/virtualized devices are seen like as actual devices from an OS point of view. Even if that does not mean any physical evidence. Let's take ivshmem for instance, found in Qemu and ACRN: it will never be a physical hardware, but it is a device on a bus with registers, memory and features.

Stating "devicetree is used to configure struct devices"

I do not like this extreme short-cut so let's not use it. Devicetree if used to describe the devices present in the target. Which devices can be related to an actual physical device or not as seen above.

When it comes to defining what should go into DT and what should not, perhaps we could relate to hard vs soft settings instead.

For instance, anything that relates to the internals of a device (registers, memory, internal feature enablement) would be interpreted as hard setting. On the other side, anything that relates to the device driver internals (thread, stack, mapping, priorities, API features etc....) could be seen as soft-settings. So far, I believe this has been managed in the right way. Where soft settings have landed generally in Kconfig (at a cost of loosing the ability to manage these settings per-instance). I have very few example where a setting landed in DT where it should not have like "zephyr,deferred-init" for instance. But all in all, this differentiation has been done pretty well.

henrikbrixandersen commented 2 months ago

Devicetree if used to describe the devices present in the target. Which devices can be related to an actual physical device or not as seen above.

When it comes to defining what should go into DT and what should not, perhaps we could relate to hard vs soft settings instead.

For instance, anything that relates to the internals of a device (registers, memory, internal feature enablement) would be interpreted as hard setting. On the other side, anything that relates to the device driver internals (thread, stack, mapping, priorities, API features etc....) could be seen as soft-settings. So far, I believe this has been managed in the right way. Where soft settings have landed generally in Kconfig (at a cost of loosing the ability to manage these settings per-instance). I have very few example where a setting landed in DT where it should not have like "zephyr,deferred-init" for instance. But all in all, this differentiation has been done pretty well.

I like this method of differentiating between the two.

pdgendt commented 2 months ago

I'd like to add the following sidenote about hard vs soft settings;

Currently an IRQ priority is a hard setting defined in device tree (per instance), but thread priorities are a soft setting. This looks evident at first, but if a device driver has a thread for each instance, there is no way of prioritizing them currently with kconfig.

If we keep this hard/soft setting distinction, and would add whatever soft setting language/framework, this is confusing IMO.