How not to break "out of tree" users

zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.

https://docs.zephyrproject.org

Apache License 2.0

10.96k stars 6.67k forks source link

How not to break "out of tree" users #48887

Open nashif opened 2 years ago

nashif commented 2 years ago

Provide some guarantees, guidelines and a process keeping out of tree users operational while the zephyr project code advances with new technologies, code cleanups and other major code and API changes.

Out of tree users are not limited to only drivers, we have users with their own subsystems, architectures, toolchains, SoCs, boards, drivers, driver subsystems etc. Any change in zephyr might break such users if changes are not following a deprecation process, announcements and a grace period given (deprecation period in many cases) to those users to adapt to the new interfaces or upstream code.

The process should find the sweet spot which allows the project to advance with its agenda and roadmap while allowing users to adapt to change.

galak commented 2 years ago

I think part of the question here is what is considered part of the Zephyr "interface" beyond APIs? Is the build system, Kconfig, devicetree/devicetree bindings? Each of these areas could break an 'out of tree' user due to a change.

nashif commented 2 years ago

I think part of the question here is what is considered part of the Zephyr "interface" beyond APIs? Is the build system, Kconfig, devicetree/devicetree bindings? Each of these areas could break an 'out of tree' user due to a change.

IMO, all of the above.

if we decide for whatever reason to drop, lets say a cmake macro (example, zephyr_library_sources_ifdef), out of tree code using this macro will break. Same with kconfig and devicetree. The level of severity if the breakage might vary and we will agree that not everything we support needs deprecation, however, we need to be aware and cautious about changes in general. The fact that removing any interface in a PR passes CI is not a green light it can be removed without any consequences. We probably need to introduce some categories of changes that needs more attention than others.

galak commented 2 years ago

Various thoughts/comments:

So for devicetree this means that any change to an existing binding would possibly be a breaking API change.
How do we track the stability of cmake, Kconfig, devicetree bindings?

mbolivar-nordic commented 2 years ago

Process WG:

Start the discussion by tackling APIs, with agreement that we need to consider other programming interfaces like Kconfig symbols and devicetree bindings, and keep them in the vision
Make include/zephyr contain only public, user-facing APIs, move internal code out of there (new use of the treewide process!) -- definition of API tbd
Make sure all APIs that remain have documentation and changes against them are checked, so no breaking changes to stable APIs, etc.
Continue discussion in the issue, revisit next week

@nashif this relates to treewide changes, but need to have some definition for what an API is and how not to break them. There are lots of interfaces that can break out of tree users.
@nashif e.g. if we have a board that gets dropped from upstream, but someone else was using a board with the same SoC, how to decide whether to delete the .dtsi?
@nashif or what about APIs (header files, function calls, defines)?
@mbolivar-nordic I would like to see a specific directory that contains the official APIs, with CI checks that make sure everything is documented, and changes being checked against the API stability level
@nashif agreed, I want to start with APIs first too
@gregshue at HP, everything was versioned. We would only make breaking changes with new versions that coexisted for a time in the tree until the removal of the deprecated API. Same with board definitions, cmake, etc. Downstream users were notified. A problem with Zephyr is we are not describing how downstream users should adapt to changes.
@nashif we did that with TCP/IP and logging in zephyr. We have the same proposal from @jfischer-no for USB.
@mbolivar-nordic this discussion should only apply to stable APIs IMO
@carlescufi the way we define "API" right now is "do we have to change your application's code?"
@keith-zephyr want to ask about the process for deprecating an API. Current process of deprecating a macro generates build warnings. In our downstream code base, warnings are promoted to errors, so that breaks our downstream / forces us to disable warning promotion
@carlescufi could you disable that particular warning instead? We have the same problem in our downstream; we have disabled particular warnings in the past
@keith-zephyr is that available for devicetree also?
@carlescufi There are a few sources of warnings: 1. compiler warnings (see above), 2. cmake warnings: we don't treat these as warnings; pretty sure Zephyr doesn't either. E.g. CMake warns on assert().
@keith-zephyr if you disable the warning, do you lose visiblity that something was deprecated?
@carlescufi I don't know; don't remember
@mmahadevan108 so cmake warnings don't break CI?
@carlescufi pretty sure no, because we build all the time with asserts enabled
@mbolivar-nordic is there some process change you want, @keith-zephyr ?
@keith-zephyr ideally a way to make deprecations print informational notices, not warnings. Becomes a source of tech debt to disable warnings
@mbolivar-nordic is there a way to make __DEPRECATED_MACRO 's behavior change with a Kconfig option to not emit a warning?
@carlescufi need to ask @tejlmand
@keith-zephyr if it's possible to make it informational, changing to a warning or error later on would make sense, but starting out informational would be a nicer start
@carlescufi there are also now kconfig warnings, handled in python since https://github.com/zephyrproject-rtos/zephyr/pull/47835/
@nashif the pip style of printing a message that doesn't prevent you from doing your work when a new version is available is nice too
@nashif a Kconfig option is a bad idea since you'd have to change the source code
@carlescufi west build ... -- -DCONFIG_WARNINGS_ARE_INFORMATIONAL=y is an alternative
@nashif an environment variable would only have to be set once, not for every build
@carlescufi we should create an issue about consistency in deprecation and get @tejlmand to come up with a proposal
@keith-zephyr sounds great; I can open the issue
@gregshue want to make sure we handle the question of what defines an API

galak commented 2 years ago

@carlescufi mentioned this, but does API mean application facing or broader?

nashif commented 2 years ago

broader, for example the arch_ interface which is not application facing needs to be in scope as well, this is being used by OOT architectures for example. There are a few other interfaces beside that to consider.

gmarull commented 2 years ago

My five cents: I'd limit this to stable public APIs. Other changes should just be written in the release notes. In any case, if a deprecation is cheap, just do it, your users will appreciate it.

Working out of tree is convenient, but it comes with a maintenance cost.

Good read: https://www.kernel.org/doc/html/latest/process/stable-api-nonsense.html#what-to-do

marc-hb commented 2 years ago

@mmahadevan108 so cmake warnings don't break CI? @carlescufi pretty sure no, because we build all the time with asserts enabled

FWIW I saw these two warnings every day for about a year and none ever stopped anything:

warning: the int symbol CORE_COUNT (defined at src/platform/Kconfig:299) has a
non-int default MP_NUM_CPUS (undefined)

CMake Warning (dev) at CMakeLists.txt:12 (zephyr_library_include_directories):
  uninitialized variable 'sof_module'
This warning is for project developers.  Use -Wno-dev to suppress it.

gregshue commented 2 years ago

broader, for example the arch_ interface which is not application facing needs to be in scope as well, this is being used by OOT architectures for example. There are a few other interfaces beside that to consider.

Agreed. Downstream users may need to add their own module with architectures, SOC definitions, drivers, boards, subsystems, tests, samples, etc. It may even need to contain alternate implementations of existing subsystems. Perhaps the collective set of "public APIs" needs to include whatever could be seen or replaced across modules.

So, what does it mean for something to be "application facing", especially if the product-unique source is only an empty main(){}?

Working out of tree is convenient, but it comes with a maintenance cost.

Working out of tree is strategic and supported. It is also essential for some of us and for an extensible platform.

mbolivar-nordic commented 2 years ago

Process WG:

@gregshue include/zephyr -- what is the point of the zephyr name?
@dleach02 I think we're only talking about what is directly inside of zephyr
@stephanosio we aren't going to change all modules
@gregshue my question is what policy are we recommending to avoid conflicts?
@keith-zephyr all the zephyr headers were moved under that directory to avoid that problem
@gregshue what wasn't clear is why the name zephyr was chosen
@fabiobaltieri what would be the alternative to zephyr?
@gregshue mcuboot has a native zephyr port; you're expecting the headers it defines in the include/zephyr namespace?
@mbolivar-nordic no; mcuboot's library interface (bootutil) is not a zephyr library, and its zephyr port is an application, not a library
@mbolivar-nordic I think we're saying that #include <zephyr/foo...> is something we'd like to reserve for APIs defined by the zephyr project
@mbolivar-nordic are we talking about module/module collisions or zephyr/module collisions?
@gregshue both
@mbolivar-nordic to some extent I think the first one is not our problem
@stephanosio we should make sure modules in zephyr/west.yml don't conflict. We can at least recommend that module-specific public headers are prefixed with module-name/. We can't enforce this.
@carlescufi I don't object to adding that recommendation to the modules documentation
@gregshue part of these recommendations should be to fill in the module name in the module.yml file
AI @gregshue to send a module documentation PR with those recommendations
@mbolivar-nordic my question is how/whether we can make sure that changes to stable APIs aren't breaking changes? Can we automate this somehow?
@keith-zephyr tagging the unit test files?
@stephanosio we have a label for "stable API change", we just don't automatically label it. Maybe we can trigger a label automation adding it to certain files. Could potentially add the test cases to a 'stable api tests' area in the maintainers file.
@dleach02 or we could put an SPDX tag in the file
@stephanosio in that case we'd have to scan everything. If we want to maintain a "stable" API set, it would be nice to have a database. If we can have a single file -- like MAINTAINERS -- with all the stable APIs, we have the database.
@MaureenHelm that's an odd indirection -- why not do that at the API level? We could do something at the doxygen tagging level. The declaration of whether an API is stable or not belongs with the API. We should be able to scrape the tree and get this table. The golden source of truth should be with the API itself.
@mbolivar-nordic I really agree; this also would make it a lot easier to maintain the "version changed" table
@gregshue how do we detect when APIs are changed
@mbolivar-nordic if we have the metadata in doxygen, with 'git log' and a script
@gregshue having trouble understanding the phrase 'we are testing the API'? My question is about 'testing', not 'modifying'. E.g. for safety, there are 4 ways to verify requirements: test, inspection, analysis, and one other. Testing I2C is always done against an implementation, not the API itself.
@dleach02 test cases are at least a starting point to make sure that changes don't break something. To the point of trying to figure out at each PR if stable APIs are not being changed in an incompatible way, we're trying to move towards automation. Inspection would imply eyes on it, so automation highlighting a PR would make the inspection easier.
@stephanosio so marking APIs for inspection, when they change
@gregshue the other possibility is writing test cases for the API. You end up with a mock of the API and a test of the API that fully uses all the definitions, without having to depend on any implementation. I'm not sure it would make sure every definition gets exercised.
@stephanosio I think here we should establish the process for how we test changes, but the actual testing strategy should be discussed in the testing WG.
@dleach02 question being asked here is since we've defined a stable API, how do we ensure it's still stable?
@MaureenHelm I was thinking about all the discussions we've had about "is-isr-safe". We had custom infrastructure to parse that stuff in our documentation generation.
@dleach02 so step 1 either way would be adding this metadata for PRs to flag
@gregshue what about whether a stable API is not the same version between different LTS versions? Semantic versioning?
@mbolivar-nordic can we try to get a resolution on whether @MaureenHelm 's approach is viable by me presenting at testing WG?
[consensus on that]
@mbolivar-nordic what are we looking for out of semantic versions of APIs? Don't think we can sell semantic versioning of APIs project wide
@fabiobaltieri I'm not really sure the 'version introduced' and 'version modified' columns in the api overview (https://docs.zephyrproject.org/latest/develop/api/overview.html) are really useful. Maybe we want something else.
@gregshue I like the question of 'scope'. I want to be able to write a module that works on multiple zephyr forks.
@carlescufi I personally don't think we should go there.
@gregshue if Zephyr is going to say we're not going to support the following use case, we should be explicit: As a downstream module owner, I need to be able to have one version of my module that builds and integrates with zephyrproject-rtos/zephyr's version of the ecosystem, nrfconnect's, and possibly other versions.
@nashif there shouldn't be an expectation that any upstream is going to support you integrating with other downstreams
@gregshue is the zephyr project going to put patterns in place to support this use case? I want to be able to identify what the fork is, what the version of the fork is, and get clues in each API about what the semantic version is.
@carlescufi I agree with @fabiobaltieri these columns don't add much value and should be removed

nashif commented 2 years ago

My feedback offline as I was not able to join the major part of the meeting

@gregshue what wasn't clear is why the name zephyr was chosen

@fabiobaltieri what would be the alternative to zephyr?

@gregshue mcuboot has a native zephyr port; you're expecting the headers it defines in the include/zephyr namespace?

@mbolivar-nordic no; mcuboot's library interface (bootutil) is not a zephyr library, and its zephyr port is an application, not a library

@mbolivar-nordic I think we're saying that #include <zephyr/foo...> is something we'd like to reserve for APIs defined by the zephyr project

@mbolivar-nordic are we talking about module/module collisions or zephyr/module collisions?

@gregshue both

@gregshue It is really frustrsating to see this type of discussion always go into modules/mcuboot and things that you might be passionate about, derailing the discussion from the actual topic.

@MaureenHelm that's an odd indirection -- why not do that at the API level? We could do something at the doxygen tagging level. The declaration of whether an API is stable or not belongs with the API. We should be able to scrape the tree and get this table. The golden source of truth should be with the API itself.

@mbolivar-nordic what are we looking for out of semantic versions of APIs? Don't think we can sell semantic versioning of APIs project wide

I think this would be a very good replacement to how we manage API stability right now using when some API was introduced/modified. Having a versioning scheme in place will help with making changes to stable APIs and marking those changes as non-breaking using semantic versioning. By just looking at the version it will be possible to see if you are still compatible without having to look at git logs or implementation changes in drivers. Tests would also need to continue working. IMO it is worth looking into and bringing it up as a proposal and get more feedback.

@fabiobaltieri I'm not really sure the 'version introduced' and 'version modified' columns in the api overview (docs.zephyrproject.org/latest/develop/api/overview.html) are really useful. Maybe we want something else.

yes, maybe drop this in favor of some versioning scheme maintained within the API using doxygen like @MaureenHelm suggested.

@gregshue I like the question of 'scope'. I want to be able to write a module that works on multiple zephyr forks.

the only thing of significance here is Zephyr project and its code base, any forks of zephyr are completely irrelevant to this discussion.

@gregshue if Zephyr is going to say we're not going to support the following use case, we should be explicit: As a downstream module owner, I need to be able to have one version of my module that builds and integrates with zephyrproject-rtos/zephyr's version of the ecosystem, nrfconnect's, and possibly other versions.

This is implicit and obvious and does not require any statements. We as the zephyr project are not responsible for content maintained in forks of Zephyr.

@gregshue is the zephyr project going to put patterns in place to support this use case? I want to be able to identify what the fork is, what the version of the fork is, and get clues in each API about what the semantic version is.

again, I am not sure why we are talking about forks. This is a distraction from the actual topic. Zephyr has 1000s of forks, why do we want this?

gregshue commented 2 years ago

It is really frustrsating to see this type of discussion always go into modules/mcuboot

@nashif It is also really frustrating to see the Zephyr Project not actually support the needs of users trying to complying with a development models it claims to support. module.yml provides an extension of the build system. As an integrator of multiple modules were I cannot control the consolidation or separation of the modules, I need everything done at a higher layer to be independent of which module the source exists in, unless it is scoped to apply to a specific directory subtree (e.g., .clang-format).

derailing the discussion from the actual topic.

As an "out-of-tree" user I am trying to rescope this discussion to meet my needs related to breakage. I'm sorry you think of it as derailing. Perhaps you need to clarify which users you are not trying to address.

Zephyr has 1000s of forks, why do we want this?

I'll assume most of the forks are tracking Zephyr main. I know some long term forks are not, and introducing incompatibilities. Not having a common way to identify the latter is the problem faced by end users. This is the same type of problem that led protocol specs to identify a field (or value range) for vendor-extension commands. If Zephyr Project defines a common mechanism for forks to be identified then end users can avoid conflicting identification solutions invented by each of the fork maintainers.

marc-hb commented 2 years ago

I believe this effort is trying to help people identify API and other incompatible changes between say upstream Zephyr version 42 and upstream Zephyr 45. If you spot one specific place/tool or identification technique that does not help with zephyr forks version 43-gregshue and 44-marc-hb, then offer a more flexible alternative there when discussing implementation details. If you can find such an alternative, chances are it will be better for upstream Zephyr branches too (cause branching and forking are the same thing). If you cannot find such an alternative, then the problem couldn't be solved anyway and no one wasted any time in abstract discussions.

All this without mentioning forks once! Magic :-)

We reject: kings, presidents, and voting. We believe in: rough consensus and running code.

gregshue commented 2 years ago

All this without mentioning forks once! Magic :-)

Almost ... "that does not help with zephyr forks" ... "cause branching and forking"

This is a present user need, not an abstract discussion. Other users independent of me have already indicated on Discord they are building one set of source reusable on both zephyrproject-rtos and nrfconnect ecosystems.

I am not concerned with the forks tracking Zephyr main. I am concerned about identifying interface changes introduced in the nrfConnect fork of Zephyr ecosystem. It would be less of a concern if nrfConnect maintained backwards compatibility at the SHA level, but it didn't.

One solution is to mix an identifier into the API semantic numbering indicating the organization defining the interface.

We reject: kings, presidents, and voting. We believe in: rough consensus and running code.

I hate to tell ya, but that ain't gonna cut it for certifiable code. ;-)

marc-hb commented 2 years ago

This is a present user need, not an abstract discussion

I was referring to the (lack of) solutions, not to the problem.

I am concerned about identifying interface changes introduced in the nrfConnect fork of Zephyr ecosystem.

What makes you think some (good and useful) API change(s) in nrfConnect won't be found in some future upstream Zephyr version? Remember: forking and branching are the same thing.

I hate to tell ya, but that ain't gonna cut it for certifiable code. ;-)

Off-topic again?

gregshue commented 2 years ago

What makes you think some (good and useful) API change(s) in nrfConnect won't be found in some future upstream Zephyr version?

I never thought that. Rather, I thought that an API change upstreamed into Zephyr would now be managed by Zephyr rather than nrfConnect. I hope nRFConnect would then deprecate/remove their implementation and align with the upstream (just like I do with my local patches to Zephyr when I integrate with a fixed version of Zephyr).

I also know that nrfConnect has rewritten Git history, so this fork isn't really the same as a branch.

Off-topic again?

Not really, and definitely on the topic of not breaking "out of tree" users. API definitions are specifications that will need to be traced back to requirements for certifiable executables. They not runnable code. (An inline implementation is distinct from the specification.)

marc-hb commented 2 years ago

I never thought that.

OK then why would API changes in nrfConnect not be manageable using the same processes and tools as API changes across upstream Zephyr branches? Considering these processes and tools don't exist yet, it sounds like you're complaining about a problem that does not exist yet.

I also know that nrfConnect has rewritten Git history, so this fork isn't really the same as a branch.

Pretty sure doxygen does not care about git history. If some other solution or tool ever relies on git history then it will be time to highlight this and discuss pros and cons.

The long story short is that Zephyr has a virtually infinite number of forks so a blanket and super vague request to "support forks" cannot possibly make sense. Only specific requests make sense; for instance: "Can this solution/tool be made compatible with rewritten git histories, pretty please?"

PS: making 1000s of random Zephyr forks "certifiable" sounds... fun! Whatever that means.

gregshue commented 2 years ago

OK then why would API changes in nrfConnect not be manageable using the same processes and tools as API changes across upstream Zephyr branches?

nrfConnect could use the same processes as API changes across upstream Zephyr branches - but it cannot assign different meanings to the same version identifiers unless some other mechanism exists to tell them apart. I look to the Zephyr Project to specify one mechanism for all forks to use.

In order for downstream developers to create a module that works with either upstream Zephyr or an incompatible nrfConnect interface must be able to know at build time which interface definition to call.

it sounds like you're complaining about a problem that does not exist yet.

It exists already. I've just had bigger issues to tackle.

aborisovich commented 2 years ago

Hi everyone, I'm also thinking about solutions to those problems you describe (but don't worry I won't be interfering much in your opinion exchange as I don't have much knowledge about processes in Zephyr). We should think about solutions to all tools/interfaces Zephyr has one by one. I'll start with Kconfig because it seems an easier problem that other ones.

Kconfig maintenance proposition

The problem:

Out of tree Zephyr application sets value to defined in Zephyr CONFIG_EXAMPLE_ZEPHYR_DRIVER=y.
Zephyr project renames CONFIG_EXAMPLE_ZEPHYR_DRIVER to CONFIG_DAI_EXAMPLE_DRIVER.
Out of tree Zephyr application can adjust itself during rebase to next zephyr revision but we also wish to somehow test compatibility from Zephyr perspective and introduce changes to end-users fluently.

Solution: Using Kconfig aliases and obsolete warnings generation using https://www.kernel.org/doc/html/latest/kbuild/kconfig-macro-language.html#built-in-functions $(warning-if,condition,text) function. Example:

$(warning-if,$(EXAMPLE_ZEPHYR_DRIVER ),Kocnfig option EXAMPLE_ZEPHYR_DRIVER is obsolete, please use DAI_EXAMPLE_DRIVER)
config EXAMPLE_ZEPHYR_DRIVER
    default n
    select DAI_EXAMPLE_DRIVER

Result: Out of tree application will receive a nice Kconfig warning that value that config value they set is obsolete... The only painful thing is monitoring on Zephyr side when to remove each of those obsolete variables (here we need robust process solution). The same goes for Devicetree, there is aliases feature (however I do not see any option of printing obsoletion messages here)...

marc-hb commented 2 years ago

but it cannot assign different meanings to the same version identifiers unless some other mechanism exists to tell them apart

Right, different forks and branches must use different identifiers to signal that they are indeed different versions. Not breaking new ground.

I look to the Zephyr Project to specify one mechanism for all forks to use.

An upstream project cannot anticipate all the potentially crazy ways it will be forked and create a versioning scheme that will be compatible with everything and anything. You can offer and recommend a "fork-friendly" versioning scheme; that seems reasonable. Can't wait to see your research and proposition.

nashif commented 2 years ago

@nashif It is also really frustrating to see the Zephyr Project not actually support the needs of users trying to complying with a development models it claims to support.

Funny, I wonder what this issue is about and why it was created in the first place. And what are those development models you are referring to exactly? Please be specific

module.yml provides an extension of the build system. As an integrator of multiple modules were I cannot control the consolidation or separation of the modules, I need everything done at a higher layer to be independent of which module the source exists in, unless it is scoped to apply to a specific directory subtree (e.g., .clang-format).

This is the most vague description of a problem I have seen in a while. I am not sure what are you asking for.

If your module has drivers, boards and anything that is supported out of tree in zephyr and you are interested in keeping those working with zephyr, then this issue is for you. If i understand the above correctly and you are asking us to make your module work with upstream zephyr and other forks the same way, then you are in the wrong place, this is not something we have ever promised, not something that we are interested in and to be honest a very strange request/expectation.

As an "out-of-tree" user I am trying to rescope this discussion to meet my needs related to breakage. I'm sorry you think of it as derailing. Perhaps you need to clarify which users you are not trying to address.

See above. If that is not clear, then I am not sure how else I would be able to clarify it.

Thats it from me. I have spent way too much time on this already.

gregshue commented 2 years ago

And what are those development models you are referring to exactly? Please be specific

In the T2: Star topology, application is the manifest repository, I am strategically reusing the repositories from zephyrproject-rtos, extending forks of other OSS projects to also be Zephyr modules, and putting licensed source into separate modules from my proprietary "applications"/boards/drivers/subsystems/tests/etc. All of my extended/proprietary repositories have the Zephyr glue in the module itself, kept in a Zephyr directory structure in the module-level zephyr/ subdirectory (next to module.yml. This relocation is necessary on some repositories due to name collisions with preexisting subdirectories. Other than the location of the Zephyr directory structure this follows the pattern in Zephyr Project's example-application module.

IIRC, the Zephyr documentation does not indicate any difference in support between the topologies or with developing as a Zephyr repository application, so I expect all the capabilities/tools/etc that work for a Zephyr repository application to also work with a Zephyr workspace application module. Maintaining this support requires all issues related to the Zephyr repository explicitly consider how the issue also may apply to any code in modules. Many (most?) issues will be independent of the module degree-of-freedom. Many issues will be impacted by the module degree-of-freedom.

It is really frustrsating to see this type of discussion always go into modules/mcuboot

I am not sure what are you asking for.

Fundamentally, I am asking the TSC Chair in particular, and voting members/maintainers/collaborators in general, to internalize that:

Modules are a supported mechanism for organizing Zephyr workspaces. Technical discussions MUST provide a solution that also applies to Zephyr-specific source in them.
The Zephyr Project already has created and owns an application module downstream of the Zephyr repository that is frequently recommended on the Discord channels as a pattern for adding proprietary content into a workspace.
Not explicitly considering how an issue or PR is impacted by the module degree-of-freedom is inconsistent with the support Zephyr Project documents.

I shouldn't have to repeatedly raise the unavoidable question about the impact of modules. But apparently I do because raising the question itself is causing frustrations rather than being accepted as necessary consideration.

marc-hb commented 2 years ago

Technical discussions MUST provide a solution that also applies to Zephyr-specific source in them

Then participate in these discussions and provide very specific, technical solutions that address your problems if/when possible. This is just getting started.

But apparently I do because raising the question itself is causing frustrations rather than being accepted as necessary consideration.

I suspect the frustration does not come from any particular topic but from the extend, vagueness and verbosity of the requests combined with the expectation level. e.g.: "MUST" support forks/modules without any technical detail. A fork and a module can be literally anything. Be specific: describe something that does not work and how it can be fixed. Except you can't yet because there's no solution yet.

gregshue commented 2 years ago

Be specific: describe something does not work and how it can be fixed.

Maintaining support for the topologies cannot depend on any single person being a watchdog on all issues. This is not my job full time. I have been engaged as I have time. (See Global Namespace Management? and Replacing zephyr driver/subsys implementations and Support module.yml in zephyr repo.)

If i understand the above correctly and you are asking us to make your module work with upstream zephyr and other forks the same way

I am not asking for you to make my module work with upstream Zephyr and other forks the same way.

gregshue commented 2 years ago

Solution: Using Kconfig aliases and obsolete warnings generation using https://www.kernel.org/doc/html/latest/kbuild/kconfig-macro-language.html#built-in-functions $(warning-if,condition,text) function.

This works reasonably well for flagging deprecated symbols when users go from one release to the next. We also need a solution that works for users that go from one LTS to the next.

As Google members identified, I think a bigger need is to tell out-of-tree developers how to transform their code (and verify the transformation) when a rebase is attempted (which may be from one LTS to the next).

marc-hb commented 2 years ago

Be specific: describe something that does not work and how it can be fixed.

Maintaining support for the topologies cannot depend on any single person being a watchdog on all issues. This is not my job full time.

This is "only" an open-source project: as long as you're the only one who cares about some feature or request getting done then it is your job full time. Explaining and convincing others may help (exceptional communication skills required).

mbolivar-nordic commented 2 years ago

Process WG: defer until next week since @nashif could not attend today

mbolivar-nordic commented 2 years ago

I shouldn't have to repeatedly raise the unavoidable question about the impact of modules. But apparently I do because raising the question itself is causing frustrations rather than being accepted as necessary consideration.

@gregshue my opinion is that you are missing the point here.

Basically all of us work on out of tree modules and we do care about them; please accept that.

I think what people are trying to tell you is that your attempts to rescope this issue are unwelcome distractions from what we are trying to do first. As has previously been stated we are trying to tackle one thing at a time, and that's not modules or mcuboot. I'm going to try to refocus this discussion in the next meeting in half an hour.

gregshue commented 2 years ago

Provide some guarantees, guidelines and a process keeping out of tree users operational

Perhaps we need to clearly describe the range of out of tree users we are trying to keep operational. I assumed it included end users doing freestanding and workspace application modules as well as downstream module developers. Are they not part of the scope trying to be addressed?

marc-hb commented 2 years ago

Are they not part of the scope trying to be addressed?

I think it depends what code they write. Yes for users who follow the guidelines TO BE DEFINED (that's the entire purpose of this issue, read its description again). Others, probably not.

Types of users will be defined by which rules they follow (the rules do not exist yet)

Perhaps we need to clearly describe the range of out of tree users we are trying to keep operational.

A formal definition of this range will be the conclusion of this work, not the starting point.

So no one can tell yet where your personal, current use case(s) will be. Terrifying? That's what this issue wants to address in the future.

mbolivar-nordic commented 2 years ago

Process WG:

@mbolivar-nordic we made progress on defining the treewide process changes; can move on to working on discussion in https://github.com/zephyrproject-rtos/zephyr/issues/48887#issuecomment-1225992073
@fabiobaltieri don't we have a z_ prefix for private interfaces / internals?
@nashif this is not consistently applied and those items shouldn't even appear in include/zephyr; we should have the subsystem itself define this or otherwise not expose it. There are so many inconsistencies in the include directory.
@fabiobaltieri we do have internal APIs but we want to share them between subsystems, though
@nashif there are 3 levels: 1. inner subsystem APIs, nobody should be using that, 2. subsystem interfaces: these need to be public APIs, 3.
@stephanosio 2. is different, we're drawing a distinction between applications and driver subsystems
@galak I think we need to get clearer definitions here. There are "application APIs" and "zephyr system APIs" for things like out of tree drivers and SoCs. Applications don't do that, but it is an API available to out of tree users. Need to be clear.
@nashif a good example is arch_. This API is defined as an interface between the kernel and the architecture. Applications shouldn't use this. We need to spend some time on this. Let's do that and come back to this
@fabiobaltieri there are multiple tiers here. It's not like Linux where we have a userspace. We do support as @galak said out of tree SoCs. Some things will be used externally and we need to be clear on what the stability level is for out of tree SoCs.
@mbolivar-nordic that is a bit of an open topic at the "developer experience" pillar in the governing board --- increased stability for APIs needed by out of tree arches/socs/etc
@fabiobaltieri I think for Zephyr, those cases should have fewer stability guarantees
@nashif this is something we need to get right. I talk to many future potential members, and this is one of the first questions they always ask. Need to make sure we don't break people using stable APIs
@fabiobaltieri or assume other incorrect things about guarantees

mbolivar-nordic commented 2 years ago

Process WG will address this next week with a discussion from @galak on documenting changes to things that are not stable.

mbolivar-nordic commented 1 year ago

Process WG:

consensus among participants that we want changes announced for unstable APIs
consensus that we want to start having automated enforcement that when unstable (or above, including stable) APIs are changed, we require an update to the release notes within the same PR
issue goes back to 'on hold' until @stephanosio comes back with an RFC pull request implementing the enforcement
any PR that changes something in include/zephyr gets a label "possible API change" or so

@galak want to discuss documentation related to changes between API states. What, if any requirements do we have for changing unstable or experimental APIs?
@dkalowsk what about when we change from experimental to unstable? E.g. if we move from there with changes from experimental, what do we need to do?
@mbolivar-nordic experimental and unstable API changes do not currently require change announcements (https://docs.zephyrproject.org/latest/develop/api/api_lifecycle.html)
@dleach02 I still think it's useful to throw documentation of all documentation changes into the release notes
@mbolivar-nordic in my experience managing releases, we have a lot of unmaintained areas and we aren't disciplined project wide about adding release notes
@dleach02 maybe we need some process to force people to update release notes when PRs come in that change APIs
@galak I would hope that the separat ediscussion in the architecture working group related to change management would also require release notes in the PR itself
@stephanosio can't we easily automate at least the changes to the versions in the APIs? e.g. diff between previous release and current
@mbolivar-nordic I'm not sure it could be done easily :)
@galak since that conversation is happening elsewhere, let's let it proceed on its own path. My first question for here is related to "changes will not be announced" appearing in the API lifecycle docs. I would propose that we do require announcement in the form of release notes when unstable APIs change.
@mbolivar-nordic I don't trust all the maintainers to get this right, so I think it's got to be a requirement for the release notes
@galak enforcement aside, everything we can't automate is best effort
@dleach02 considering project roles and responsibilities: you're saying that during a review, you are responsible for being aware that changes to the release notes are required.
@galak I think the burden is on the person making the API change. Reviewers should ask for release notes changes
@dkalowsk I hear you're dancing around versioned APIs
@galak I want to separate enforcement from "should we do it"
@mbolivar-nordic does anybody not want changes announced for unstable APIs?
no one spoke up; assume that's consensus
@galak I ask the same question for experimental
@mbolivar-nordic I think we shouldn't require that for experimental APIs
@stephanosio maybe we don't need emails, but including this in the release notes is enough
@dleach02 I could argue @mbolivar-nordic 's point, but I don't think it hurts to say we should put it in the release notes as well
@galak I worry about experimental, e.g. what if you completely rip it up and change it wholesale? maybe just a short statement that something changed, not a detailed description
@keith-zephyr I think encouraging iteration and updates to experimental APIs without the burden of updating the release notes is a better model
@galak I could buy that as well: if you're using something experimental, you should be following along
@dleach02 are we saying it's too much of a burden to put a note in the release notes?
@mbolivar-nordic since there's no consensus on experimental, I think we shouldn't propose this without an enforcement mechanism
@galak what's the enforcement mechanism for stable APIs?
@mbolivar-nordic stable APIs tend to have better maintainers
@stephanosio we can add a label automatically if there is a header change. But if there's a behavior change without introducing changes to the header, it's harder to detect.
@mbolivar-nordic so is the idea that we add a label that says release notes updates are required for changes to any header marked unstable (or above, including 'stable'). Is there a volunteer to make it happen in terms of the github workflows?
@stephanosio I'll take a look, not at a high priority

PerMac commented 1 year ago

As a tester and a person responsible for internal CIs I have another observation, which I believe was not raised in this topic.

Background: We (Nordic) have an sdk which is expanding zephyr. To allow for a proper integration, we have a fork of zephyr, where some extra patches have to be added on top of upstream code. Several times per year we do a synchronization of this zephyr fork with a current upstream. It is rather demanding process due to the amount of changes.

Issue: During almost every of such synchronization processes we had to promptly fix issues related to twister, which are blocking the whole testing. Most times the command we use in CI was not working any more. Most of such issues were generated by rather minor changes (change in names of twister arguments, some argument becoming default, etc.). Since we have multiple independent CI plans involved in such process and many of those using twister, even such minor issues are escalating quickly. IMO the reason is that twister development is very tightly connected with zephyr. Of course, there are reasons behind it, twister is a tool created to support zephyr's development. However, twister is a very useful tool not only in the scope of pure upstream zephyr. Other projects, based on zephyr, can as well benefit from its usage within their projects (as we do).

Idea: Make twister development decoupled from zephyr, having its own versioning. E.g. by moving it to its own repo. IMO this could help projects like ours, where we could delay an update to a newer version of twister if there are issues there instead of promptly fixing/reverting commits/finding workarounds during the demanding process of the whole zephyr synchronization. It could also result in speeding the update, if some feature is needed, without waiting for the whole zephyr sync. I think it could also be beneficial for the upstream. E.g. we could think of some "staging" environment: twister updates could first be added to a "staging" branch. Some cross-checks with the "main" version could be running there. If everything is fine, "staging" will be merged to "main".

keith-zephyr commented 1 year ago

@wbober - Summary of issues discussed at F2F in Prague @nashif - Only APIs have a life cycle. Devicetree and kconfig don't yet have a defined life cycle. But this is also needed so deprecation policies can be consistent. @nashif - architecture features are defined by API (irq enable/disable, sys IO). Architecture APIs are internal, but need a life cycle/stability process. What's the policy for extending arch interfaces? @nashif - need to mark internal APIs clearly so users no not to use them @gregshue - namespace management is an issue. Include paths, names of boards, names of HW blocks used in samples. What about downstream users that have an out of tree SoC? Documentation tags are another area. @wbober - first pass is to keep scope limited to the core issues only. And then do another iteration to broaden to more public interfaces. @gregshue - header file include path. @gregshue - need a policy how to reference files in modules @fabiobaltieri - not all Kconfig and devicetree bindings are public. Not practical to make all public @galak - Doesn't agree step is "amend policy". Enforcement should be first. Need to figure out it will be managed @nashif - Need to prioritize this list (APIs first, and namespaces) @wbober- Nordiq can allocate resources to help with tooling @wbober - as of today Kconfig, devictree and Cmake is considered unstable. APIs have a life cycle.
@wbober - there is a need to define a policy for the missing area. But we need enforcement to actually make progres @wbober - Need to fill in gaps. Define which Kconfig symbols are public for example. Same for DT bindings. @wbober - once public APIs are defined, Nordiq will start creating tooling @nashif - premature to define tooling without knowing the scope of the Kconfig and DT public areas. @wbober - agrees - the policy needs to be defined along with the items to watch @nashif - first step is define the guidelines and communicate this to maintainers. Make sure reviewers/maintainers raise issues on PRs that change a public interface. @nashif - treewide policy provides some guidance that can be leveraged @gregshue - a policy that isn't enforceable isn't really a policy. So need to consider "enforcability" of specific policies @wbober - enough manual labor can make anything enforceable, but not desired. We can start with manual (code reviews) and then add a tooling later. @wbober - proposed life cycle: Experimental, Unstable, Stable.
@gregshue - when integrated Zephyr with other repositories - has run into namespace conflicts in the Kconfig space. This namespace is flat. Zephyr's Kconfig namepace isn't well manage right now. Minimizing risk of Kconfig conflicts. @nashif - completely avoiding Kconfig conflicts is sepate from the stable public interface issues. @nashif - internal/external for Kconfig. helper symbols by definition are internal (symbol can only be set by another symbol). @gregshue - has considered helper symbols for the internal/external, but even internal symbols can generate conflicts with out of tree symbols @nashif, @keith-zephyr - Agree that namespace issues are important issue, but out of scope for defining public interface lifecycle @keith-zephyr - helper symbols can in some cases be considered pubic -or least part of the architecture API interface @wbober - need to define rules to partition internal/external for header files, Kconfig, and DT bindings to start @nashif, @keith-zephyr - agree with this prioritization @nashif - namespace will need to be dealt with later. CI (twister tooling) also needs to be handled at later steps or as a separate issue. @dleach02 - suggest to @gregshue to enumerate specific risks to his downstream project @gregshue - downstream CI isn't as much of a problem. But risk is integrating multiple projects that generate Kconfig conflicts. Suggests creating a prefix to Zephyr Kconfigs. @nashif - that might be too broad a change

gregshue commented 1 year ago

During today's PWG meeting @nashif asked which modules (in zephyrproject-rtos) were defining Kconfigs. A quick find of at SHA 74c4d1c52 (June 5, 2023) shows at least the following:

modules/hal/silabs/zephyr/Kconfig (implicit reference)
modules/hal/espressif/zephyr/Kconfig (referenced for espressif/zephyr/module.yml)
modules/lib/picolibc/zephyr/Kconfig (referenced from picolibc/zephyr/module.yml)
modules/lib/zscilib/Kconfig.zscilib (referenced from zscilib/zephyr/module.yml)
modules/lib/chre/platform/zephyr/Kconfig (referenced from chre/zephyr/module.yml)
modules/lib/gui/lvgl/zephyr/Kconfig (implicit reference)
modules/audio/sof/zephyr/Kconfig (implicit reference)
bootloader/mcuboot/boot/zephyr/Kconfig (referenced from mcuboot/zephyr/module.yml)

nashif commented 1 year ago

As a tester and a person responsible for internal CIs I have another observation, which I believe was not raised in this topic.

Because this is not an out-of-tree user case. CI and tooling used in CI is a different category and many of the policies and discussions here do not apply, i.e. currently there is no intention to maintain APIs backward compatible or define some deprecation for features etc in twister and other tooling. What you are talking about and correct me if I am wrong, are mostly bugs, not because someone intentionally changed some interface or API. Most major changes are usually discussed and reviewed when it comes to tooling, so unless I am missing something that is more serious than implementation bugs, please list those in a new issues ("how not to break downstream CI" maybe)

Where twister is maintained, in-tree or a seperate tool will not solve the problem. Interfaces and documented features can be maintained anywhere. When a change goes into twister, usually it deals with some issue, such issue fixes will need to be integrated with the main tree sooner than later, doing this from out of tree twister will just make things more complicated and given this is CI, every will require pulling external twister. There is also the need to run twister with old code, which is currently possible only because of twister being in the tree.

nashif commented 1 year ago

During today's PWG meeting @nashif asked which modules (in zephyrproject-rtos) were defining Kconfigs. A quick find of at SHA 74c4d1c52 (June 5, 2023) shows at least the following:

modules/hal/silabs/zephyr/Kconfig (implicit reference)

modules/hal/espressif/zephyr/Kconfig (referenced for espressif/zephyr/module.yml)

modules/lib/picolibc/zephyr/Kconfig (referenced from picolibc/zephyr/module.yml)

modules/lib/zscilib/Kconfig.zscilib (referenced from zscilib/zephyr/module.yml)

modules/lib/chre/platform/zephyr/Kconfig (referenced from chre/zephyr/module.yml)

modules/lib/gui/lvgl/zephyr/Kconfig (implicit reference)

modules/audio/sof/zephyr/Kconfig (implicit reference)

bootloader/mcuboot/boot/zephyr/Kconfig (referenced from mcuboot/zephyr/module.yml)

nice list, but most of those are actually Zephyr Kconfigs, i.e. they are driven by Zephyr and are not part of how configuration of the standalone code of the module works. The only Kconfig users who might conflict is SOF AFAIK but this is already contrained and we can keep the namespace sane given that most SOF developers work on Zephyr already.

gregshue commented 1 year ago

AFAICT, we have been using "out-of-tree" to mean content defined/controlled outside the zephyr repository. This term doesn't seem to be in the Glossary of Terms. Is there an actual definition somewhere?

are not part of how configuration of the standalone code of the module works

Look again at zscilib, chre, lvgl, and mcuboot. Each has module build files or standalone code that is controlled by their locally defined Kconfigs.

Note that we cannot control the symbols encountered by an out-of-tree user. The best we can do is recommend a pattern that scales well and live within it ourselves.

nashif commented 1 year ago

ok, zscilib, mcuboot are both tightly coupled with zephyr, in the case of zclib, the kconfigs already namespaced and kconfig in there is primarily is part of the integration with zephyr. Same thing for chre which has all Kconfig usage in platform/zephyr, i.e. it was added there as part of the porting to zephyr, it is also namespaced. If the ask here about having modules use some prefix and namespace for the integration with zephyr, that is nice and we see that most already do that.

SOF for example is another class of user, it did use kconfig before it started using zephyr and there we had some issues with configs, given that it implemented the same things we had in zephyr, this is all going away as more zephyr integration happens, better namespacing would have made certain things easier, but this is an exception, we do not have this type of usage very often where two similar system integrate with each other (where CONFIG_LOG can become a contested string).

Having said all of that, I do not see how namespacing is going to solve the problem we are dealing with here. It is a different issue, important one, but wheter I call an API zephyr_blah() or just blah() does not really matter if I change the signature or change the behaviour and do not provide backward compatibility.

gregshue commented 1 year ago

If the ask here about having modules use some prefix and namespace for the integration with zephyr, that is nice and we see that most already do that.

That should be the recommendation from the Zephyr Project to all repositories being integrated with Zephyr. The Zephyr Project has some control over every repository within zephyrproject-rtos, so this should be required of all repositories under zephyrproject-rtos.

I do not see how namespacing is going to solve the problem we are dealing with here.

Breaking changes will inevitably happen. The aggregate architecture must be able to evolve (e.g., pinctrl). We cannot eliminate it, so we must reduce:

the need for change, through:
- Requirements-driven architecture (reduces risk)
- Namespacing (reduces potential and/or realized conflicts; enhances predictability)
the cost of change, through:
- API/etc. lifecycle management (reduces urgency; raises awareness)
- Scripted transformations (reduces cost of tracking changes)
- Backwards compatibility and/or providing extensibility (reduce the impact of a change)

nashif commented 1 year ago

Thoughts about Kconfig:

New Kconfigs to be scrutized more and we should avoid ad-hoc, specialized Kconfigs when it is possible to have them defined in a generic way
promptless and hidden Kconfigs to be considered internal
Kconfigs shall be tied to the area they configure, so, if the subsystem is experimental, the kconfigs will also be experimental and the kconfigs will follow the same api cycle of the area they serve. We should not have a different lifecycle for kconfigs or other interfaces we want to manage.
Each area should provide some testcases for all Kconfigs being used in that area allowing us to catch issues and changes easily (change to kconfig would require a change to test).
In general, and not limited to Kconfig we need a way to flag changes to interfaces and tests. If I change and interface and the test assoiated with it this should raise a flag, i.e. was the change to a stable api (and the test was changed just to make the test pass with the new API... )
From a tooling prespective, we need to look at capturing sginatures of a kconfig sub-tree serving a subsystem and raise the flag when the signature changes for a stable area (added dependencies, dropped dependencies, change kconfigs, new kconfigs, etc).

PerMac commented 1 year ago

Because this is not an out-of-tree user case.

We have hundreds of tests and samples in our repo, which is not part of zephyr tree. We are using tooling from the zephyr tree to execute them. Very often synchronization with zephyr is blocked due to changes in the tooling. Why this doesn't count as an out-of-tree user case?

currently there is no intention to maintain APIs backward compatible or define some deprecation for features etc in twister and other tooling.

Why? Is it set in stone and out of a discussion?

What you are talking about and correct me if I am wrong, are mostly bugs, not because someone intentionally changed some interface or API.

Not really. Most of those are intentional changes. --testcase-root -> --testsuite-root. Or the recent one, when --board-root is loaded by default for out-of-tree modules. Some issues were introduced by myself as well, where I tried to unified how tests ids are handled for in-tree and out-of-tree tests ending in changes needed in downstream CIs twice, when it was added and then reverted after a while.

Most major changes are usually discussed and reviewed when it comes to tooling, so unless I am missing something that is more serious than implementation bugs, please list those in a new issues ("how not to break downstream CI" maybe)

Indeed. But as you pointed, there is no intention for backward compatibility. And I think it is generally not evaluated how the changes can affect out-of-tree usage when reviewing new features/fixes. Definitely, we will think about what can be done on "how not to break downstream CI". I wanted to share my POV here, since I found this within broad topic as "How not to break "out of tree" users".

Where twister is maintained, in-tree or a seperate tool will not solve the problem. Interfaces and documented features can be maintained anywhere. When a change goes into twister, usually it deals with some issue, such issue fixes will need to be integrated with the main tree sooner than later, doing this from out of tree twister will just make things more complicated and given this is CI, every will require pulling external twister.

As already mentioned, I believe having a separate place for development than zephyr's main tree can benefit the tooling. I know that we have internal teams that needs a custom (patched) version of twister, e.g. to support testing of features which are not public yet. Since twister comes a a part of a big package as zephyr, it requires more effort to work on the tool itself. Using twister in out-of-tree project requires synchronization with the whole zephyr tree or rather cherry-picking certain commit. Sure, updating/fixing twister will require an extra step, e.g. changing the version in the manifest. However, not every change to twister requires immediate update in the zephyr.

There is also the need to run twister with old code, which is currently possible only because of twister being in the tree.

I don't follow this. I am not proposing developing twister as an independent python package installed e.g with pip, as west is (although personally i think it can be beneficial). If twister version is controlled through west manifest then running twister with old code is as easy as checking out old zephyr and doing west update, to get twister which was used back then. What's more, if twister repo is independent from zephyr's main tree, one can test old zephyr with new twister and vice versa, which is now not that easy. E.g. right now I cannot check if proposed changes to twister won't brake our internal usage, since we are not using main zephyr. With twister as a separate repo it will be as easy as referencing a twister PR in the projects manifest. I am aware, that some changes in twister are coupled with stuff happening in zephyr and obviously not every version of twister will work with every version of zephyr. But I think the amount of such couplings is limited and during the most of development 1:1 coupling is not a must.

I know that my issue is not in line with the ongoing discussion, however, I don't find it off-topic. I agree we can move the discussion to a separate issue, as you proposed. Nevertheless, I wanted to share it with broader audience, since this issue is becoming more and more present in our development as more and more of our teams are starting to use twister in their verification plans and updating zephyr literally breaks their work.

nashif commented 1 year ago

We have hundreds of tests and samples in our repo, which is not part of zephyr tree. We are using tooling from the zephyr tree to execute them. Very often synchronization with zephyr is blocked due to changes in the tooling. Why this doesn't count as an out-of-tree user case?

This issue is about end users and API compatibility and IMO we should keep it this way. This also includes tests and sample. This why we deprecated ztest for example and will only remove old ztest once the deprecation period has passed.

I agree there needs some level of control and some assurance and attention paid to how our tooling moves forward to avoid breaking downstream CI, but this will need to be discussed in a completely different context. CI environment, test environments and approach to testing and CI in general varies from one organisation to the next, none of that should impact the upstream CI activities.

currently there is no intention to maintain APIs backward compatible or define some deprecation for features etc in twister and other tooling.

Why? Is it set in stone and out of a discussion?

No, it is not set in stone. But from your initial comment it is not clear exactly what the problem is and how severe the problem is. We have been trying as much as we can to keep old options working and backward compatible, things get missed.

Indeed. But as you pointed, there is no intention for backward compatibility.

I am talking about backward compatibility on a different layer. I think external interface (command line options) should be backward compatible and we should not drop options randomly. However, we can't keep for example the history of how we generate results or reports and how we deal with tests in general backward compatible. If we decide that some tests should be marked differently at some point, does not mean we will have to maintain the old behavior while we implement the new one.

and I think it is generally not evaluated how the changes can affect out-of-tree usage when reviewing new features/fixes.

That is the thing, CI environment are different and there is no way for us to track the various way of running CI environments and testing. The only defense you will have is, upstream your code, participate in review and try to be close to upstream as much as possible.

I don't follow this. I am not proposing developing twister as an independent python package installed e.g with pip, as west is (although personally i think it can be beneficial). If twister version is controlled through west manifest then running twister with old code is as easy as checking out old zephyr and doing west update, to get twister which was used back then. What's more, if twister repo is independent from zephyr's main tree, one can test old zephyr with new twister and vice versa, which is now not that easy. E.g. right now I cannot check if proposed changes to twister won't brake our internal usage, since we are not using main zephyr. With twister as a separate repo it will be as easy as referencing a twister PR in the projects manifest. I am aware, that some changes in twister are coupled with stuff happening in zephyr and obviously not every version of twister will work with every version of zephyr. But I think the amount of such couplings is limited and during the most of development 1:1 coupling is not a must.

I think all of this can also be resolved with in-tree twister. Before going any other way, I would like to see what are the "interfaces" and assets we want to protect to avoid breakage, address those with additional testing and documentation and some guidelines etc. Moving twister to a seperate repo will not provide any immediate results if we do not define the interface and get more input etc.

nashif commented 1 year ago

I think all of this can also be resolved with in-tree twister. Before going any other way, I would like to see what are the "interfaces" and assets we want to protect to avoid breakage, address those with additional testing and documentation and some guidelines etc. Moving twister to a seperate repo will not provide any immediate results if we do not define the interface and get more input etc.

in other words, lets not jump into solution space before we have evaluated the problem first.

gregshue commented 1 year ago

This issue is about end users and API compatibility

Here are a few points to consider:

Manufacturers of ETSI 303645 compliant devices are recommended to act on disclosed vulnerabilities in a timely manner (Provision 5.2-2). Conventionally the process is completed within 90 days for a software solution. In order for a manufacturer to roll that solution out it has to propagate through and be verified in intermediate forks/projects in a much shorter timeframe. Problems for Nordic become a problem for some of their customers (e.g., me).
Technically, Nordic has a manifest-module application repository (sdk-nrf) and deploys FW images (.hex or sources) on and/or for their device hardware (dev kits) that they sell. AFAICT they meet the characteristics of an end user.
The scope in this comment is different than the description at the top of this issue. The original scope is not limited to "end users". It is also not limited to API compatibliity - unless API includes anything/everything that can be triggered from a script or command line.
OOT users include those going from one LTS to another.

nashif commented 1 year ago

Manufacturers of ETSI 303645 compliant devices are recommended to act on disclosed vulnerabilities in a timely manner (Provision 5.2-2). Conventionally the process is completed within 90 days for a software solution. In order for a manufacturer to roll that solution out it has to propagate through and be verified in intermediate forks/projects in a much shorter timeframe. Problems for Nordic become a problem for some of their customers (e.g., me).

¯\(ツ)/¯

No idea where all of this going, so I am just going to take a break from this issue.

fabiobaltieri commented 1 year ago

Hey, few thoughts on my side now that I had a bit of time thinking about it:

https://github.com/zephyrproject-rtos/zephyr/issues/48887#issuecomment-1671552502 comment is about asking to decouple twister, I'd say it's a big enough topic it may deserve its own discussion
About the general idea of extending the concept of API beyond the programing interface and into Kconfig, CMake etc... that may make sense, but then how does it help with the idea of "not break out of tree user"? At the end of the day if the developers think that an API has to be changed for the progress of the project, it will be changed, and out of tree code is going to break. I think if we add more process about changes, it has to be clear how that helps makign out of tree breakagaes easier to manage. If the only thing it does is making our own developer life more miserable, then we it'd be just trying to discourage changes and project wide rework and improvments, which could make the project stable in the short term but would certainly bite us back in the long term

nashif commented 1 year ago

Here we a good example of possible issues: https://github.com/zephyrproject-rtos/zephyr/issues/61413

A change removes the need for one kconfig needed for something, but it seems someone else was relying on this Kconfig
A call to a configurable hook that has the z_ namespace (internal functions in zephyr). The hook z_arm_on_enter_cpuidle in https://github.com/nrfconnect/sdk-nrf/commit/bd581bca7fed90d457b25c892f53741815e22ee6 is one of those that should have its own namespace and should not be z_ prefixed. Letting it appear internal will make it easy for someone to just kill it, rename it etc and thus break downstreams without any further notice. Additionally, in that source file ('subsys/debug/etb_trace/etb_trace_lp.c') it appears as an internal function , i..e no indication this is actually a hook defined and called somewhere else (by architecture code in Zephyr)