Open nashif opened 2 years ago
I think part of the question here is what is considered part of the Zephyr "interface" beyond APIs? Is the build system, Kconfig, devicetree/devicetree bindings? Each of these areas could break an 'out of tree' user due to a change.
I think part of the question here is what is considered part of the Zephyr "interface" beyond APIs? Is the build system, Kconfig, devicetree/devicetree bindings? Each of these areas could break an 'out of tree' user due to a change.
IMO, all of the above.
if we decide for whatever reason to drop, lets say a cmake macro (example, zephyr_library_sources_ifdef
), out of tree code using this macro will break. Same with kconfig and devicetree. The level of severity if the breakage might vary and we will agree that not everything we support needs deprecation, however, we need to be aware and cautious about changes in general. The fact that removing any interface in a PR passes CI is not a green light it can be removed without any consequences.
We probably need to introduce some categories of changes that needs more attention than others.
Various thoughts/comments:
So for devicetree this means that any change to an existing binding would possibly be a breaking API change.
How do we track the stability of cmake, Kconfig, devicetree bindings?
Process WG:
include/zephyr
contain only public, user-facing APIs, move internal code out of there (new use of the treewide
process!) -- definition of API tbd__DEPRECATED_MACRO
's behavior change with a Kconfig option to not emit a warning?west build ... -- -DCONFIG_WARNINGS_ARE_INFORMATIONAL=y
is an alternative@carlescufi mentioned this, but does API mean application facing or broader?
broader, for example the arch_ interface which is not application facing needs to be in scope as well, this is being used by OOT architectures for example. There are a few other interfaces beside that to consider.
My five cents: I'd limit this to stable public APIs. Other changes should just be written in the release notes. In any case, if a deprecation is cheap, just do it, your users will appreciate it.
Working out of tree is convenient, but it comes with a maintenance cost.
Good read: https://www.kernel.org/doc/html/latest/process/stable-api-nonsense.html#what-to-do
@mmahadevan108 so cmake warnings don't break CI? @carlescufi pretty sure no, because we build all the time with asserts enabled
FWIW I saw these two warnings every day for about a year and none ever stopped anything:
warning: the int symbol CORE_COUNT (defined at src/platform/Kconfig:299) has a
non-int default MP_NUM_CPUS (undefined)
CMake Warning (dev) at CMakeLists.txt:12 (zephyr_library_include_directories):
uninitialized variable 'sof_module'
This warning is for project developers. Use -Wno-dev to suppress it.
broader, for example the arch_ interface which is not application facing needs to be in scope as well, this is being used by OOT architectures for example. There are a few other interfaces beside that to consider.
Agreed. Downstream users may need to add their own module with architectures, SOC definitions, drivers, boards, subsystems, tests, samples, etc. It may even need to contain alternate implementations of existing subsystems. Perhaps the collective set of "public APIs" needs to include whatever could be seen or replaced across modules.
So, what does it mean for something to be "application facing", especially if the product-unique source is only an empty main(){}
?
Working out of tree is convenient, but it comes with a maintenance cost.
Working out of tree is strategic and supported. It is also essential for some of us and for an extensible platform.
Process WG:
include/zephyr
-- what is the point of the zephyr
name?zephyr
?#include <zephyr/foo...>
is something we'd like to reserve for APIs defined by the zephyr projectmodule-name/
. We can't enforce this.My feedback offline as I was not able to join the major part of the meeting
- @gregshue what wasn't clear is why the name zephyr was chosen
- @fabiobaltieri what would be the alternative to
zephyr
?- @gregshue mcuboot has a native zephyr port; you're expecting the headers it defines in the include/zephyr namespace?
- @mbolivar-nordic no; mcuboot's library interface (bootutil) is not a zephyr library, and its zephyr port is an application, not a library
- @mbolivar-nordic I think we're saying that
#include <zephyr/foo...>
is something we'd like to reserve for APIs defined by the zephyr project- @mbolivar-nordic are we talking about module/module collisions or zephyr/module collisions?
- @gregshue both
@gregshue It is really frustrsating to see this type of discussion always go into modules/mcuboot and things that you might be passionate about, derailing the discussion from the actual topic.
- @MaureenHelm that's an odd indirection -- why not do that at the API level? We could do something at the doxygen tagging level. The declaration of whether an API is stable or not belongs with the API. We should be able to scrape the tree and get this table. The golden source of truth should be with the API itself.
+1
- @mbolivar-nordic what are we looking for out of semantic versions of APIs? Don't think we can sell semantic versioning of APIs project wide
I think this would be a very good replacement to how we manage API stability right now using when some API was introduced/modified. Having a versioning scheme in place will help with making changes to stable APIs and marking those changes as non-breaking using semantic versioning. By just looking at the version it will be possible to see if you are still compatible without having to look at git logs or implementation changes in drivers. Tests would also need to continue working. IMO it is worth looking into and bringing it up as a proposal and get more feedback.
- @fabiobaltieri I'm not really sure the 'version introduced' and 'version modified' columns in the api overview (docs.zephyrproject.org/latest/develop/api/overview.html) are really useful. Maybe we want something else.
yes, maybe drop this in favor of some versioning scheme maintained within the API using doxygen like @MaureenHelm suggested.
- @gregshue I like the question of 'scope'. I want to be able to write a module that works on multiple zephyr forks.
the only thing of significance here is Zephyr project and its code base, any forks of zephyr are completely irrelevant to this discussion.
- @gregshue if Zephyr is going to say we're not going to support the following use case, we should be explicit: As a downstream module owner, I need to be able to have one version of my module that builds and integrates with zephyrproject-rtos/zephyr's version of the ecosystem, nrfconnect's, and possibly other versions.
This is implicit and obvious and does not require any statements. We as the zephyr project are not responsible for content maintained in forks of Zephyr.
- @gregshue is the zephyr project going to put patterns in place to support this use case? I want to be able to identify what the fork is, what the version of the fork is, and get clues in each API about what the semantic version is.
again, I am not sure why we are talking about forks. This is a distraction from the actual topic. Zephyr has 1000s of forks, why do we want this?
It is really frustrsating to see this type of discussion always go into modules/mcuboot
@nashif It is also really frustrating to see the Zephyr Project not actually support the needs of users trying to complying with a development models it claims to support. module.yml
provides an extension of the build system. As an integrator of multiple modules were I cannot control the consolidation or separation of the modules, I need everything done at a higher layer to be independent of which module the source exists in, unless it is scoped to apply to a specific directory subtree (e.g., .clang-format).
derailing the discussion from the actual topic.
As an "out-of-tree" user I am trying to rescope this discussion to meet my needs related to breakage. I'm sorry you think of it as derailing. Perhaps you need to clarify which users you are not trying to address.
Zephyr has 1000s of forks, why do we want this?
I'll assume most of the forks are tracking Zephyr main. I know some long term forks are not, and introducing incompatibilities. Not having a common way to identify the latter is the problem faced by end users. This is the same type of problem that led protocol specs to identify a field (or value range) for vendor-extension commands. If Zephyr Project defines a common mechanism for forks to be identified then end users can avoid conflicting identification solutions invented by each of the fork maintainers.
I believe this effort is trying to help people identify API and other incompatible changes between say upstream Zephyr version 42 and upstream Zephyr 45. If you spot one specific place/tool or identification technique that does not help with zephyr forks version 43-gregshue and 44-marc-hb, then offer a more flexible alternative there when discussing implementation details. If you can find such an alternative, chances are it will be better for upstream Zephyr branches too (cause branching and forking are the same thing). If you cannot find such an alternative, then the problem couldn't be solved anyway and no one wasted any time in abstract discussions.
All this without mentioning forks once! Magic :-)
We reject: kings, presidents, and voting. We believe in: rough consensus and running code.
All this without mentioning forks once! Magic :-)
Almost ... "that does not help with zephyr forks" ... "cause branching and forking"
This is a present user need, not an abstract discussion. Other users independent of me have already indicated on Discord they are building one set of source reusable on both zephyrproject-rtos and nrfconnect ecosystems.
I am not concerned with the forks tracking Zephyr main. I am concerned about identifying interface changes introduced in the nrfConnect fork of Zephyr ecosystem. It would be less of a concern if nrfConnect maintained backwards compatibility at the SHA level, but it didn't.
One solution is to mix an identifier into the API semantic numbering indicating the organization defining the interface.
We reject: kings, presidents, and voting. We believe in: rough consensus and running code.
I hate to tell ya, but that ain't gonna cut it for certifiable code. ;-)
This is a present user need, not an abstract discussion
I was referring to the (lack of) solutions, not to the problem.
I am concerned about identifying interface changes introduced in the nrfConnect fork of Zephyr ecosystem.
What makes you think some (good and useful) API change(s) in nrfConnect won't be found in some future upstream Zephyr version? Remember: forking and branching are the same thing.
I hate to tell ya, but that ain't gonna cut it for certifiable code. ;-)
Off-topic again?
What makes you think some (good and useful) API change(s) in nrfConnect won't be found in some future upstream Zephyr version?
I never thought that. Rather, I thought that an API change upstreamed into Zephyr would now be managed by Zephyr rather than nrfConnect. I hope nRFConnect would then deprecate/remove their implementation and align with the upstream (just like I do with my local patches to Zephyr when I integrate with a fixed version of Zephyr).
I also know that nrfConnect has rewritten Git history, so this fork isn't really the same as a branch.
Off-topic again?
Not really, and definitely on the topic of not breaking "out of tree" users. API definitions are specifications that will need to be traced back to requirements for certifiable executables. They not runnable code. (An inline implementation is distinct from the specification.)
I never thought that.
OK then why would API changes in nrfConnect not be manageable using the same processes and tools as API changes across upstream Zephyr branches? Considering these processes and tools don't exist yet, it sounds like you're complaining about a problem that does not exist yet.
I also know that nrfConnect has rewritten Git history, so this fork isn't really the same as a branch.
Pretty sure doxygen does not care about git history. If some other solution or tool ever relies on git history then it will be time to highlight this and discuss pros and cons.
The long story short is that Zephyr has a virtually infinite number of forks so a blanket and super vague request to "support forks" cannot possibly make sense. Only specific requests make sense; for instance: "Can this solution/tool be made compatible with rewritten git histories, pretty please?"
PS: making 1000s of random Zephyr forks "certifiable" sounds... fun! Whatever that means.
OK then why would API changes in nrfConnect not be manageable using the same processes and tools as API changes across upstream Zephyr branches?
nrfConnect could use the same processes as API changes across upstream Zephyr branches - but it cannot assign different meanings to the same version identifiers unless some other mechanism exists to tell them apart. I look to the Zephyr Project to specify one mechanism for all forks to use.
In order for downstream developers to create a module that works with either upstream Zephyr or an incompatible nrfConnect interface must be able to know at build time which interface definition to call.
it sounds like you're complaining about a problem that does not exist yet.
It exists already. I've just had bigger issues to tackle.
Hi everyone, I'm also thinking about solutions to those problems you describe (but don't worry I won't be interfering much in your opinion exchange as I don't have much knowledge about processes in Zephyr). We should think about solutions to all tools/interfaces Zephyr has one by one. I'll start with Kconfig because it seems an easier problem that other ones.
The problem:
Solution: Using Kconfig aliases and obsolete warnings generation using https://www.kernel.org/doc/html/latest/kbuild/kconfig-macro-language.html#built-in-functions $(warning-if,condition,text) function. Example:
$(warning-if,$(EXAMPLE_ZEPHYR_DRIVER ),Kocnfig option EXAMPLE_ZEPHYR_DRIVER is obsolete, please use DAI_EXAMPLE_DRIVER)
config EXAMPLE_ZEPHYR_DRIVER
default n
select DAI_EXAMPLE_DRIVER
Result:
Out of tree application will receive a nice Kconfig warning that value that config value they set is obsolete...
The only painful thing is monitoring on Zephyr side when to remove each of those obsolete variables (here we need robust process solution).
The same goes for Devicetree, there is aliases
feature (however I do not see any option of printing obsoletion messages here)...
but it cannot assign different meanings to the same version identifiers unless some other mechanism exists to tell them apart
Right, different forks and branches must use different identifiers to signal that they are indeed different versions. Not breaking new ground.
I look to the Zephyr Project to specify one mechanism for all forks to use.
An upstream project cannot anticipate all the potentially crazy ways it will be forked and create a versioning scheme that will be compatible with everything and anything. You can offer and recommend a "fork-friendly" versioning scheme; that seems reasonable. Can't wait to see your research and proposition.
@nashif It is also really frustrating to see the Zephyr Project not actually support the needs of users trying to complying with a development models it claims to support.
Funny, I wonder what this issue is about and why it was created in the first place. And what are those development models you are referring to exactly? Please be specific
module.yml provides an extension of the build system. As an integrator of multiple modules were I cannot control the consolidation or separation of the modules, I need everything done at a higher layer to be independent of which module the source exists in, unless it is scoped to apply to a specific directory subtree (e.g., .clang-format).
This is the most vague description of a problem I have seen in a while. I am not sure what are you asking for.
If your module has drivers, boards and anything that is supported out of tree in zephyr and you are interested in keeping those working with zephyr, then this issue is for you. If i understand the above correctly and you are asking us to make your module work with upstream zephyr and other forks the same way, then you are in the wrong place, this is not something we have ever promised, not something that we are interested in and to be honest a very strange request/expectation.
As an "out-of-tree" user I am trying to rescope this discussion to meet my needs related to breakage. I'm sorry you think of it as derailing. Perhaps you need to clarify which users you are not trying to address.
See above. If that is not clear, then I am not sure how else I would be able to clarify it.
Thats it from me. I have spent way too much time on this already.
And what are those development models you are referring to exactly? Please be specific
In the T2: Star topology, application is the manifest repository, I am strategically reusing the repositories from zephyrproject-rtos, extending forks of other OSS projects to also be Zephyr modules, and putting licensed source into separate modules from my proprietary "applications"/boards/drivers/subsystems/tests/etc. All of my extended/proprietary repositories have the Zephyr glue in the module itself, kept in a Zephyr directory structure in the module-level zephyr/
subdirectory (next to module.yml
. This relocation is necessary on some repositories due to name collisions with preexisting subdirectories. Other than the location of the Zephyr directory structure this follows the pattern in Zephyr Project's example-application
module.
IIRC, the Zephyr documentation does not indicate any difference in support between the topologies or with developing as a Zephyr repository application, so I expect all the capabilities/tools/etc that work for a Zephyr repository application to also work with a Zephyr workspace application module. Maintaining this support requires all issues related to the Zephyr repository explicitly consider how the issue also may apply to any code in modules. Many (most?) issues will be independent of the module degree-of-freedom. Many issues will be impacted by the module degree-of-freedom.
It is really frustrsating to see this type of discussion always go into modules/mcuboot
I am not sure what are you asking for.
Fundamentally, I am asking the TSC Chair in particular, and voting members/maintainers/collaborators in general, to internalize that:
I shouldn't have to repeatedly raise the unavoidable question about the impact of modules. But apparently I do because raising the question itself is causing frustrations rather than being accepted as necessary consideration.
Technical discussions MUST provide a solution that also applies to Zephyr-specific source in them
Then participate in these discussions and provide very specific, technical solutions that address your problems if/when possible. This is just getting started.
But apparently I do because raising the question itself is causing frustrations rather than being accepted as necessary consideration.
I suspect the frustration does not come from any particular topic but from the extend, vagueness and verbosity of the requests combined with the expectation level. e.g.: "MUST" support forks/modules without any technical detail. A fork and a module can be literally anything. Be specific: describe something that does not work and how it can be fixed. Except you can't yet because there's no solution yet.
Be specific: describe something does not work and how it can be fixed.
Maintaining support for the topologies cannot depend on any single person being a watchdog on all issues. This is not my job full time. I have been engaged as I have time. (See Global Namespace Management? and Replacing zephyr driver/subsys implementations and Support module.yml in zephyr repo.)
If i understand the above correctly and you are asking us to make your module work with upstream zephyr and other forks the same way
I am not asking for you to make my module work with upstream Zephyr and other forks the same way.
Solution: Using Kconfig aliases and obsolete warnings generation using https://www.kernel.org/doc/html/latest/kbuild/kconfig-macro-language.html#built-in-functions $(warning-if,condition,text) function.
This works reasonably well for flagging deprecated symbols when users go from one release to the next. We also need a solution that works for users that go from one LTS to the next.
As Google members identified, I think a bigger need is to tell out-of-tree developers how to transform their code (and verify the transformation) when a rebase is attempted (which may be from one LTS to the next).
Be specific: describe something that does not work and how it can be fixed.
Maintaining support for the topologies cannot depend on any single person being a watchdog on all issues. This is not my job full time.
This is "only" an open-source project: as long as you're the only one who cares about some feature or request getting done then it is your job full time. Explaining and convincing others may help (exceptional communication skills required).
Process WG: defer until next week since @nashif could not attend today
I shouldn't have to repeatedly raise the unavoidable question about the impact of modules. But apparently I do because raising the question itself is causing frustrations rather than being accepted as necessary consideration.
@gregshue my opinion is that you are missing the point here.
Basically all of us work on out of tree modules and we do care about them; please accept that.
I think what people are trying to tell you is that your attempts to rescope this issue are unwelcome distractions from what we are trying to do first. As has previously been stated we are trying to tackle one thing at a time, and that's not modules or mcuboot. I'm going to try to refocus this discussion in the next meeting in half an hour.
Provide some guarantees, guidelines and a process keeping out of tree users operational
Perhaps we need to clearly describe the range of out of tree users we are trying to keep operational. I assumed it included end users doing freestanding and workspace application modules as well as downstream module developers. Are they not part of the scope trying to be addressed?
Are they not part of the scope trying to be addressed?
I think it depends what code they write. Yes for users who follow the guidelines TO BE DEFINED (that's the entire purpose of this issue, read its description again). Others, probably not.
Types of users will be defined by which rules they follow (the rules do not exist yet)
Perhaps we need to clearly describe the range of out of tree users we are trying to keep operational.
A formal definition of this range will be the conclusion of this work, not the starting point.
So no one can tell yet where your personal, current use case(s) will be. Terrifying? That's what this issue wants to address in the future.
Process WG:
z_
prefix for private interfaces / internals?include/zephyr
; we should have the subsystem itself define this or otherwise not expose it. There are so many inconsistencies in the include directory.arch_
. This API is defined as an interface between the kernel and the architecture. Applications shouldn't use this. We need to spend some time on this. Let's do that and come back to thisProcess WG will address this next week with a discussion from @galak on documenting changes to things that are not stable.
Process WG:
As a tester and a person responsible for internal CIs I have another observation, which I believe was not raised in this topic.
Background: We (Nordic) have an sdk which is expanding zephyr. To allow for a proper integration, we have a fork of zephyr, where some extra patches have to be added on top of upstream code. Several times per year we do a synchronization of this zephyr fork with a current upstream. It is rather demanding process due to the amount of changes.
Issue: During almost every of such synchronization processes we had to promptly fix issues related to twister, which are blocking the whole testing. Most times the command we use in CI was not working any more. Most of such issues were generated by rather minor changes (change in names of twister arguments, some argument becoming default, etc.). Since we have multiple independent CI plans involved in such process and many of those using twister, even such minor issues are escalating quickly. IMO the reason is that twister development is very tightly connected with zephyr. Of course, there are reasons behind it, twister is a tool created to support zephyr's development. However, twister is a very useful tool not only in the scope of pure upstream zephyr. Other projects, based on zephyr, can as well benefit from its usage within their projects (as we do).
Idea: Make twister development decoupled from zephyr, having its own versioning. E.g. by moving it to its own repo. IMO this could help projects like ours, where we could delay an update to a newer version of twister if there are issues there instead of promptly fixing/reverting commits/finding workarounds during the demanding process of the whole zephyr synchronization. It could also result in speeding the update, if some feature is needed, without waiting for the whole zephyr sync. I think it could also be beneficial for the upstream. E.g. we could think of some "staging" environment: twister updates could first be added to a "staging" branch. Some cross-checks with the "main" version could be running there. If everything is fine, "staging" will be merged to "main".
@wbober - Summary of issues discussed at F2F in Prague
@nashif - Only APIs have a life cycle. Devicetree and kconfig don't yet have a defined life cycle. But this is also needed so deprecation policies can be consistent.
@nashif - architecture features are defined by API (irq enable/disable, sys IO). Architecture APIs are internal, but need a life cycle/stability process. What's the policy for extending arch interfaces?
@nashif - need to mark internal APIs clearly so users no not to use them
@gregshue - namespace management is an issue. Include paths, names of boards, names of HW blocks used in samples. What about downstream users that have an out of tree SoC? Documentation tags are another area.
@wbober - first pass is to keep scope limited to the core issues only. And then do another iteration to broaden to more public interfaces.
@gregshue - header file include path.
@gregshue - need a policy how to reference files in modules
@fabiobaltieri - not all Kconfig and devicetree bindings are public. Not practical to make all public
@galak - Doesn't agree step is "amend policy". Enforcement should be first. Need to figure out it will be managed
@nashif - Need to prioritize this list (APIs first, and namespaces)
@wbober- Nordiq can allocate resources to help with tooling
@wbober - as of today Kconfig, devictree and Cmake is considered unstable. APIs have a life cycle.
@wbober - there is a need to define a policy for the missing area. But we need enforcement to actually make progres
@wbober - Need to fill in gaps. Define which Kconfig symbols are public for example. Same for DT bindings.
@wbober - once public APIs are defined, Nordiq will start creating tooling
@nashif - premature to define tooling without knowing the scope of the Kconfig and DT public areas.
@wbober - agrees - the policy needs to be defined along with the items to watch
@nashif - first step is define the guidelines and communicate this to maintainers. Make sure reviewers/maintainers raise issues on PRs that change a public interface.
@nashif - treewide policy provides some guidance that can be leveraged
@gregshue - a policy that isn't enforceable isn't really a policy. So need to consider "enforcability" of specific policies
@wbober - enough manual labor can make anything enforceable, but not desired. We can start with manual (code reviews) and then add a tooling later.
@wbober - proposed life cycle: Experimental, Unstable, Stable.
@gregshue - when integrated Zephyr with other repositories - has run into namespace conflicts in the Kconfig space. This namespace is flat. Zephyr's Kconfig namepace isn't well manage right now. Minimizing risk of Kconfig conflicts.
@nashif - completely avoiding Kconfig conflicts is sepate from the stable public interface issues.
@nashif - internal/external for Kconfig. helper symbols by definition are internal (symbol can only be set by another symbol).
@gregshue - has considered helper symbols for the internal/external, but even internal symbols can generate conflicts with out of tree symbols
@nashif, @keith-zephyr - Agree that namespace issues are important issue, but out of scope for defining public interface lifecycle
@keith-zephyr - helper symbols can in some cases be considered pubic -or least part of the architecture API interface
@wbober - need to define rules to partition internal/external for header files, Kconfig, and DT bindings to start
@nashif, @keith-zephyr - agree with this prioritization
@nashif - namespace will need to be dealt with later. CI (twister tooling) also needs to be handled at later steps or as a separate issue.
@dleach02 - suggest to @gregshue to enumerate specific risks to his downstream project
@gregshue - downstream CI isn't as much of a problem. But risk is integrating multiple projects that generate Kconfig conflicts. Suggests creating a prefix to Zephyr Kconfigs.
@nashif - that might be too broad a change
During today's PWG meeting @nashif asked which modules (in zephyrproject-rtos) were defining Kconfigs. A quick find of at SHA 74c4d1c52 (June 5, 2023) shows at least the following:
As a tester and a person responsible for internal CIs I have another observation, which I believe was not raised in this topic.
Because this is not an out-of-tree user case. CI and tooling used in CI is a different category and many of the policies and discussions here do not apply, i.e. currently there is no intention to maintain APIs backward compatible or define some deprecation for features etc in twister and other tooling. What you are talking about and correct me if I am wrong, are mostly bugs, not because someone intentionally changed some interface or API. Most major changes are usually discussed and reviewed when it comes to tooling, so unless I am missing something that is more serious than implementation bugs, please list those in a new issues ("how not to break downstream CI" maybe)
Where twister is maintained, in-tree or a seperate tool will not solve the problem. Interfaces and documented features can be maintained anywhere. When a change goes into twister, usually it deals with some issue, such issue fixes will need to be integrated with the main tree sooner than later, doing this from out of tree twister will just make things more complicated and given this is CI, every will require pulling external twister. There is also the need to run twister with old code, which is currently possible only because of twister being in the tree.
During today's PWG meeting @nashif asked which modules (in zephyrproject-rtos) were defining Kconfigs. A quick find of at SHA 74c4d1c52 (June 5, 2023) shows at least the following:
- modules/hal/silabs/zephyr/Kconfig (implicit reference)
- modules/hal/espressif/zephyr/Kconfig (referenced for espressif/zephyr/module.yml)
- modules/lib/picolibc/zephyr/Kconfig (referenced from picolibc/zephyr/module.yml)
- modules/lib/zscilib/Kconfig.zscilib (referenced from zscilib/zephyr/module.yml)
- modules/lib/chre/platform/zephyr/Kconfig (referenced from chre/zephyr/module.yml)
- modules/lib/gui/lvgl/zephyr/Kconfig (implicit reference)
- modules/audio/sof/zephyr/Kconfig (implicit reference)
- bootloader/mcuboot/boot/zephyr/Kconfig (referenced from mcuboot/zephyr/module.yml)
nice list, but most of those are actually Zephyr Kconfigs, i.e. they are driven by Zephyr and are not part of how configuration of the standalone code of the module works. The only Kconfig users who might conflict is SOF AFAIK but this is already contrained and we can keep the namespace sane given that most SOF developers work on Zephyr already.
AFAICT, we have been using "out-of-tree" to mean content defined/controlled outside the zephyr repository. This term doesn't seem to be in the Glossary of Terms. Is there an actual definition somewhere?
are not part of how configuration of the standalone code of the module works
Look again at zscilib, chre, lvgl, and mcuboot. Each has module build files or standalone code that is controlled by their locally defined Kconfigs.
Note that we cannot control the symbols encountered by an out-of-tree user. The best we can do is recommend a pattern that scales well and live within it ourselves.
ok, zscilib, mcuboot are both tightly coupled with zephyr, in the case of zclib, the kconfigs already namespaced and kconfig in there is primarily is part of the integration with zephyr. Same thing for chre which has all Kconfig usage in platform/zephyr, i.e. it was added there as part of the porting to zephyr, it is also namespaced. If the ask here about having modules use some prefix and namespace for the integration with zephyr, that is nice and we see that most already do that.
SOF for example is another class of user, it did use kconfig before it started using zephyr and there we had some issues with configs, given that it implemented the same things we had in zephyr, this is all going away as more zephyr integration happens, better namespacing would have made certain things easier, but this is an exception, we do not have this type of usage very often where two similar system integrate with each other (where CONFIG_LOG can become a contested string).
Having said all of that, I do not see how namespacing is going to solve the problem we are dealing with here. It is a different issue, important one, but wheter I call an API zephyr_blah()
or just blah()
does not really matter if I change the signature or change the behaviour and do not provide backward compatibility.
If the ask here about having modules use some prefix and namespace for the integration with zephyr, that is nice and we see that most already do that.
That should be the recommendation from the Zephyr Project to all repositories being integrated with Zephyr. The Zephyr Project has some control over every repository within zephyrproject-rtos, so this should be required of all repositories under zephyrproject-rtos.
I do not see how namespacing is going to solve the problem we are dealing with here.
Breaking changes will inevitably happen. The aggregate architecture must be able to evolve (e.g., pinctrl). We cannot eliminate it, so we must reduce:
Thoughts about Kconfig:
Because this is not an out-of-tree user case.
We have hundreds of tests and samples in our repo, which is not part of zephyr tree. We are using tooling from the zephyr tree to execute them. Very often synchronization with zephyr is blocked due to changes in the tooling. Why this doesn't count as an out-of-tree user case?
currently there is no intention to maintain APIs backward compatible or define some deprecation for features etc in twister and other tooling.
Why? Is it set in stone and out of a discussion?
What you are talking about and correct me if I am wrong, are mostly bugs, not because someone intentionally changed some interface or API.
Not really. Most of those are intentional changes. --testcase-root -> --testsuite-root. Or the recent one, when --board-root is loaded by default for out-of-tree modules. Some issues were introduced by myself as well, where I tried to unified how tests ids are handled for in-tree and out-of-tree tests ending in changes needed in downstream CIs twice, when it was added and then reverted after a while.
Most major changes are usually discussed and reviewed when it comes to tooling, so unless I am missing something that is more serious than implementation bugs, please list those in a new issues ("how not to break downstream CI" maybe)
Indeed. But as you pointed, there is no intention for backward compatibility. And I think it is generally not evaluated how the changes can affect out-of-tree usage when reviewing new features/fixes. Definitely, we will think about what can be done on "how not to break downstream CI". I wanted to share my POV here, since I found this within broad topic as "How not to break "out of tree" users".
Where twister is maintained, in-tree or a seperate tool will not solve the problem. Interfaces and documented features can be maintained anywhere. When a change goes into twister, usually it deals with some issue, such issue fixes will need to be integrated with the main tree sooner than later, doing this from out of tree twister will just make things more complicated and given this is CI, every will require pulling external twister.
As already mentioned, I believe having a separate place for development than zephyr's main tree can benefit the tooling. I know that we have internal teams that needs a custom (patched) version of twister, e.g. to support testing of features which are not public yet. Since twister comes a a part of a big package as zephyr, it requires more effort to work on the tool itself. Using twister in out-of-tree project requires synchronization with the whole zephyr tree or rather cherry-picking certain commit. Sure, updating/fixing twister will require an extra step, e.g. changing the version in the manifest. However, not every change to twister requires immediate update in the zephyr.
There is also the need to run twister with old code, which is currently possible only because of twister being in the tree.
I don't follow this. I am not proposing developing twister as an independent python package installed e.g with pip, as west is (although personally i think it can be beneficial). If twister version is controlled through west manifest then running twister with old code is as easy as checking out old zephyr and doing west update, to get twister which was used back then. What's more, if twister repo is independent from zephyr's main tree, one can test old zephyr with new twister and vice versa, which is now not that easy. E.g. right now I cannot check if proposed changes to twister won't brake our internal usage, since we are not using main zephyr. With twister as a separate repo it will be as easy as referencing a twister PR in the projects manifest. I am aware, that some changes in twister are coupled with stuff happening in zephyr and obviously not every version of twister will work with every version of zephyr. But I think the amount of such couplings is limited and during the most of development 1:1 coupling is not a must.
I know that my issue is not in line with the ongoing discussion, however, I don't find it off-topic. I agree we can move the discussion to a separate issue, as you proposed. Nevertheless, I wanted to share it with broader audience, since this issue is becoming more and more present in our development as more and more of our teams are starting to use twister in their verification plans and updating zephyr literally breaks their work.
We have hundreds of tests and samples in our repo, which is not part of zephyr tree. We are using tooling from the zephyr tree to execute them. Very often synchronization with zephyr is blocked due to changes in the tooling. Why this doesn't count as an out-of-tree user case?
This issue is about end users and API compatibility and IMO we should keep it this way. This also includes tests and sample. This why we deprecated ztest for example and will only remove old ztest once the deprecation period has passed.
I agree there needs some level of control and some assurance and attention paid to how our tooling moves forward to avoid breaking downstream CI, but this will need to be discussed in a completely different context. CI environment, test environments and approach to testing and CI in general varies from one organisation to the next, none of that should impact the upstream CI activities.
currently there is no intention to maintain APIs backward compatible or define some deprecation for features etc in twister and other tooling.
Why? Is it set in stone and out of a discussion?
No, it is not set in stone. But from your initial comment it is not clear exactly what the problem is and how severe the problem is. We have been trying as much as we can to keep old options working and backward compatible, things get missed.
Indeed. But as you pointed, there is no intention for backward compatibility.
I am talking about backward compatibility on a different layer. I think external interface (command line options) should be backward compatible and we should not drop options randomly. However, we can't keep for example the history of how we generate results or reports and how we deal with tests in general backward compatible. If we decide that some tests should be marked differently at some point, does not mean we will have to maintain the old behavior while we implement the new one.
and I think it is generally not evaluated how the changes can affect out-of-tree usage when reviewing new features/fixes.
That is the thing, CI environment are different and there is no way for us to track the various way of running CI environments and testing. The only defense you will have is, upstream your code, participate in review and try to be close to upstream as much as possible.
I don't follow this. I am not proposing developing twister as an independent python package installed e.g with pip, as west is (although personally i think it can be beneficial). If twister version is controlled through west manifest then running twister with old code is as easy as checking out old zephyr and doing west update, to get twister which was used back then. What's more, if twister repo is independent from zephyr's main tree, one can test old zephyr with new twister and vice versa, which is now not that easy. E.g. right now I cannot check if proposed changes to twister won't brake our internal usage, since we are not using main zephyr. With twister as a separate repo it will be as easy as referencing a twister PR in the projects manifest. I am aware, that some changes in twister are coupled with stuff happening in zephyr and obviously not every version of twister will work with every version of zephyr. But I think the amount of such couplings is limited and during the most of development 1:1 coupling is not a must.
I think all of this can also be resolved with in-tree twister. Before going any other way, I would like to see what are the "interfaces" and assets we want to protect to avoid breakage, address those with additional testing and documentation and some guidelines etc. Moving twister to a seperate repo will not provide any immediate results if we do not define the interface and get more input etc.
I think all of this can also be resolved with in-tree twister. Before going any other way, I would like to see what are the "interfaces" and assets we want to protect to avoid breakage, address those with additional testing and documentation and some guidelines etc. Moving twister to a seperate repo will not provide any immediate results if we do not define the interface and get more input etc.
in other words, lets not jump into solution space before we have evaluated the problem first.
This issue is about end users and API compatibility
Here are a few points to consider:
- Manufacturers of ETSI 303645 compliant devices are recommended to act on disclosed vulnerabilities in a timely manner (Provision 5.2-2). Conventionally the process is completed within 90 days for a software solution. In order for a manufacturer to roll that solution out it has to propagate through and be verified in intermediate forks/projects in a much shorter timeframe. Problems for Nordic become a problem for some of their customers (e.g., me).
¯\(ツ)/¯
No idea where all of this going, so I am just going to take a break from this issue.
Hey, few thoughts on my side now that I had a bit of time thinking about it:
https://github.com/zephyrproject-rtos/zephyr/issues/48887#issuecomment-1671552502 comment is about asking to decouple twister, I'd say it's a big enough topic it may deserve its own discussion
About the general idea of extending the concept of API beyond the programing interface and into Kconfig, CMake etc... that may make sense, but then how does it help with the idea of "not break out of tree user"? At the end of the day if the developers think that an API has to be changed for the progress of the project, it will be changed, and out of tree code is going to break. I think if we add more process about changes, it has to be clear how that helps makign out of tree breakagaes easier to manage. If the only thing it does is making our own developer life more miserable, then we it'd be just trying to discourage changes and project wide rework and improvments, which could make the project stable in the short term but would certainly bite us back in the long term
Here we a good example of possible issues: https://github.com/zephyrproject-rtos/zephyr/issues/61413
z_arm_on_enter_cpuidle
in https://github.com/nrfconnect/sdk-nrf/commit/bd581bca7fed90d457b25c892f53741815e22ee6 is one of those that should have its own namespace and should not be z_ prefixed. Letting it appear internal will make it easy for someone to just kill it, rename it etc and thus break downstreams without any further notice. Additionally, in that source file ('subsys/debug/etb_trace/etb_trace_lp.c') it appears as an internal function , i..e no indication this is actually a hook defined and called somewhere else (by architecture code in Zephyr)
Provide some guarantees, guidelines and a process keeping out of tree users operational while the zephyr project code advances with new technologies, code cleanups and other major code and API changes.
Out of tree users are not limited to only drivers, we have users with their own subsystems, architectures, toolchains, SoCs, boards, drivers, driver subsystems etc. Any change in zephyr might break such users if changes are not following a deprecation process, announcements and a grace period given (deprecation period in many cases) to those users to adapt to the new interfaces or upstream code.
The process should find the sweet spot which allows the project to advance with its agenda and roadmap while allowing users to adapt to change.