zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.52k stars 6.44k forks source link

Improve header structure and usage #41543

Open gmarull opened 2 years ago

gmarull commented 2 years ago

Is your enhancement proposal related to a problem? Please describe.

As of today one can observe a few issues with respect to C headers in Zephyr:

All these issues may seem not relevant, but over time this translates to increased compile times, changes in a header that have a domino effect to many source files, etc.

I've opened this issue inspired by https://lwn.net/Articles/880175/.

Describe the solution you'd like

Another interesting change that, I think, would be useful is to place all Zephyr headers into a zephyr folder, so that all Zephyr headers would then be used as:

#include <zephyr/kernel.h>
#include <zephyr/drivers/gpio.h>

This would help a lot to identify Zephyr headers, to create a single clang-format sorting rule, etc.

Describe alternatives you've considered

Let the problem grow and open a PR in the future with 2,297 commits in to fix the issues :-)

Additional context

Include graph for kernel.h as generated by Doxygen (can likely be improved):

image

henrikbrixandersen commented 2 years ago

I generally like the ideas put forward in the description. However, in my experience we have always opted to put local includes first in the compiler include search path, but last in the list of #include directives. This allows for the local includes to have conditionals based on the scope they are included in. I would be interested to hear arguments for putting them first?

Apart from that, the proposed order is the same as what I would recommend.

gmarull commented 2 years ago

I generally like the ideas put forward in the description. However, in my experience we have always opted to put local includes first in the compiler include search path, but last in the list of #include directives. This allows for the local includes to have conditionals based on the scope they are included in. I would be interested to hear arguments for putting them first?

Apart from that, the proposed order is the same as what I would recommend.

The reason is explained in the Google C++ styleguide:

With the preferred ordering, if the related header dir2/foo2.h omits any necessary includes, the build of dir/foo.cc or dir/foo_test.cc will break. Thus, this rule ensures that build breaks show up first for the people working on these files, not for innocent people in other packages.

dir/foo.cc and dir2/foo2.h are usually in the same directory (e.g., base/basictypes_test.cc and base/basictypes.h), but may sometimes be in different directories too.

Some C projects, e.g. https://github.com/libgit2/libgit2 seem to follow this convention too. Could you provide an example of your usecase?

marc-hb commented 2 years ago

I understand backward compatibility has always been king in C; explains why we're still stuck with the 50-years old #include. However and as highlighted by this issue, #include leaves a large spectrum of possibilities: from disastrous to somewhat maintainable.

From time to time I've (ab)used gcc -E -dI -dM -dZ ... but surely proper tools / compiler plugins exist to help manage this modularity mess? This problem is universal, affects every single C project above a relatively small size. Policies are great but without the automation to implement and maintain them they never go far.

EDIT: I mean tools to not just order includes but to actually analyse dependencies, think jdeps , https://www.jetbrains.com/help/idea/dependencies-analysis.html, https://www.jarchitect.com/Doc_VS_Arch etc. Doxygen can produce a .dot graph of includes, maybe that's a good start?

alwa-nordic commented 2 years ago

There exists a tool for detecting indirect dependencies we could consider using: https://include-what-you-use.org/

Indirect dependencies / transitive includes are a reliance on a implementation detail and can break in the future. E.g:

//// feature_foo.h:
#include <stdbool.h>
// Declare Foo

//// user_of_feature_foo.c:
#include <feature_foo.h>
bool baz; // Accidentally use `bool` from <feature_foo.h>
// Use Foo

If feature_foo.h drops its dependency on stdbool, user_of_feature_foo.c will unexpectedly break.

gmarull commented 2 years ago

There exists a tool for detecting indirect dependencies we could consider using: include-what-you-use.org

That's an interesting option to look at!

yperess commented 2 years ago

Personal preferences:

  1. Stick to an existing style guide, it's much easier. Google style guide works, I wouldn't change it.
  2. Adding a zephyr/ prefix to the include path would be fantastic but hard to do (I'm sure we'll get a lot of complaints for breaking downstream projects). That said, I think it's one of those better sooner rather than later so I would vote to just rip the bandaid.
gmarull commented 2 years ago

Personal preferences:

  1. Stick to an existing style guide, it's much easier. Google style guide works, I wouldn't change it.

Agreed, no reason to reinvent yet another style guide.

  1. Adding a zephyr/ prefix to the include path would be fantastic but hard to do (I'm sure we'll get a lot of complaints for breaking downstream projects). That said, I think it's one of those better sooner rather than later so I would vote to just rip the bandaid.

It should be easy to keep a compatibility option, i.e., include both include/ and include/zephyr.

gregshue commented 2 years ago

Hmm ... watch what happens when we look at the bigger picture.

First establish certain rules.

I completely agree - but I think your strategy is misdirected.

The Zephyr build system must be able to compile code from outside the Zephyr project that the user cannot modify. In fact, when a platinum user needs to reuse the audit artifacts for a safety or security submission, they must use a specific SHA of zephyr itself. So to fulfill the Zephyr Project Mission Statement, the zephyr ecosystem must explicitly carve out subsets of identifiers in each global namespace.

The header file pathnames are only one example of global namespaces that must be managed. Others include: ⦁ linker symbols, ⦁ module identifiers, ⦁ logging identifiers ⦁ shell commands ⦁ preprocessor symbols, ⦁ Kconfig symbols, ⦁ CMake variables, ⦁ doxygen custom tags (requirement IDs), ⦁ RST symbols, ⦁ testcase.yaml/sample.yaml test identifiers ⦁ etc.

Zephyr already has an established pattern for subsystem/driver/kernel header file pathnames that is dependent on subsystem/driver names and linker symbols. When we solve the linker symbols then we have a pattern that will (mostly) resolve the issues.

What's a suitable pattern? Prefix linker symbols and subsystem/driver names with a multi-character abbreviation of the module name.

I think this will address most of the problems across most (all?) of the global namespaces.

Thoughts?

mbolivar-nordic commented 2 years ago

Zephyr Project Mission Statement

Source?

mbolivar-nordic commented 2 years ago

2. Adding a zephyr/ prefix to the include path would be fantastic but hard to do (I'm sure we'll get a lot of complaints for breaking downstream projects). That said, I think it's one of those better sooner rather than later so I would vote to just rip the bandaid.

This has been debated and rejected by the TSC in the past, just FYI. I don't remember the reason why, but you can start digging here:

Edit: 22 May 2019, found it:

[Anas] Does vote need to be a package or dealt with separately?
[Carles + Alberto] better vote 1st part
[Anas] Wants this to be voted, but would prefer to not raise motion as he wants to vote against.
[Alberto] Raises motion to vote on Stage 1 of the Proposal.
[Carles] Seconds motion to vote
From Alberto Escolar (Demant) to Everyone: (08:41 AM)

+1

From Carles to Everyone: (08:42 AM)

+1

From Michael Scott  to Everyone: (08:42 AM)

-1

From Johan Hedberg to Everyone: (08:42 AM)

-1

From Anas Nashif to Everyone: (08:42 AM)

-1

From Ruud to Everyone: (08:42 AM)

-1

From Asger Munk Nielsen to Everyone: (08:42 AM)

+1

From Nathaniel Graff to Everyone: (08:42 AM)

abstain

From Davide ciminaghi to Everyone: (08:42 AM)

-1

From Johann Fischer to Everyone: (08:42 AM)

-1

From David Leach to Everyone: (08:42 AM)

-1

From Ioannis Glaropoulos to Everyone: (08:42 AM)

abstain

From Maureen Helm to Everyone: (08:42 AM)

abstain

Result: Vote does not pass

https://docs.google.com/document/d/12p8Q4USdExsP3a8lIdhyqG-4vxb44kisccZ9TvlxWBA/edit#

This was a long time ago and the project has grown a lot since then, so it may be worth revisiting this, but it would have to go through the TSC

gregshue commented 2 years ago

Zephyr Project Mission Statement

Source?

https://www.zephyrproject.org/wp-content/uploads/sites/38/2020/09/CLEAN-LF-Zephyr-Charter-20200624-effective-20200901.pdf

mbolivar-nordic commented 2 years ago

Zephyr Project Mission Statement

Source?

https://www.zephyrproject.org/wp-content/uploads/sites/38/2020/09/CLEAN-LF-Zephyr-Charter-20200624-effective-20200901.pdf

Thanks! This document says:

  1. Mission of the Zephyr Project (“Zephyr,” or, alternatively, the “Project”). The mission of the Project is to: a. deliver the best-in-class RTOS for connected resource-constrained devices, built to be secure and safe. b. maintain an auditable code base, while taking advantage of community participation; this auditable code base is open source; c. include participation of leading members of this ecosystem, including micro-controller manufacturers, hardware developers, software developers and other members of the ecosystem; and d. host the infrastructure for the open source Project and sub-projects, establishing a neutral home for community meetings, events and collaborative discussions and providing structure around the business and technical governance of the Project.

On the other hand, you wrote:

In fact, when a platinum user needs to reuse the audit artifacts for a safety or security submission, they must use a specific SHA of zephyr itself. So to fulfill the Zephyr Project Mission Statement, the zephyr ecosystem must explicitly carve out subsets of identifiers in each global namespace.

I think there are quite a few unconnected dots between the two.

nashif commented 2 years ago

This was a long time ago and the project has grown a lot since then, so it may be worth revisiting this, but it would have to go through the TSC

agree. It has been 3 years and we should be able to bring this topic back and revisit.

gregshue commented 2 years ago

I think there are quite a few unconnected dots between the two.

Yes. Here is my thinking (and I am not an expert in this area). AFAICT:

Could Zephyr achieve "best in class" without fulfilling all of this? Temporarily, but the Zephyr Project should concede that competition is likely. The plan for sustaining "best in class" must plan for perpetually addressing shortcomings. Whatever is a recognized shortcoming will eventually be addressed by the Zephyr Project, either because competition already addressed it or they want to avoid it being a competition point.

What am I missing?

mbolivar-nordic commented 2 years ago

What am I missing?

IMO you are missing the other advantages to platinum membership documented here: https://www.zephyrproject.org/become-a-member/

Focusing on the audit artifacts as the only carrot for becoming a platinum member seems a bit odd. But that's just me.

gregshue commented 2 years ago

Focusing on the audit artifacts as the only carrot for becoming a platinum member seems a bit odd. But that's just me.

I never claimed that the artifacts were the only carrot for becoming a platinum member, or even the largest. I was just following that it is valuable to become one. This is carrot worth noting because it validates the total-reuse Use Case, which will drive the importance and direction of this RFC and related issues.

mbolivar-nordic commented 1 year ago

Arch WG: postponed due to @gmarull absence

carlescufi commented 1 year ago

Arch WG:

marc-hb commented 1 year ago

It needs to "run on diffs", checking only the files that have been modified.

A while ago I tried to find a simple Github answer to that question. This feels like a very basic and common need yet it was not built-in and all the actions I found sucked (including the one used by Zephyr at a time). This was many months ago.

I wrote a basic shell script which has been working great but it's project specific and not re-usable https://github.com/thesofproject/sof-test/blob/main/tools/CI/check-gitrange.bash

If you know anything please share.

EDIT: https://github.com/marketplace?type=actions&query=changed+files+

EDIT: finding list of changed files and PR commits discussed again in #64517

Hopefully more on topic:

@gmarull mentions forward declaration as one of the issues that we will need to make a choice on

This is one of the things the sparse static analyzer has been "spamming" us with a lot:

/zep_workspace/zephyr/subsys/tracing/tracing_none.c:10:6: warning: symbol 'sys_trace_isr_exit' was not declared. Should it be static?
/zep_workspace/zephyr/subsys/tracing/tracing_none.c:12:6: warning: symbol 'sys_trace_isr_exit_to_scheduler' was not declared. Should it be static?

/zep_workspace/zephyr/include/zephyr/debug/coredump.h:219:6: warning: symbol 'coredump_buffer_output' was not declared. Should it be static?
/zep_workspace/zephyr/include/zephyr/debug/coredump.h:223:5: warning: symbol 'coredump_query' was not declared. Should it be static?
/zep_workspace/zephyr/include/zephyr/debug/coredump.h:228:5: warning: symbol 'coredump_cmd' was not declared. Should it be static?

/zep_workspace/zephyr/arch/xtensa/core/xtensa-asm2.c:216:1: warning: symbol 'xtensa_int3_c' was not declared. Should it be static?
/zep_workspace/zephyr/arch/xtensa/core/xtensa-asm2.c:220:1: warning: symbol 'xtensa_int4_c' was not declared. Should it be static?

etc.

marc-hb commented 1 year ago

As of today one can observe a few issues with respect to C headers in Zephyr:

  • ...
  • ...

New item: two .h files directly including each other:

cfriedt commented 10 months ago

Is anyone not on-board with this? Can we incorporate this into coding guidelines prior to LTSv3? I guess there were some treewide changes - what is the progress? Is there anything left to do?

@gmarull - if there are problematic areas, feel free to delegate tasks for others to clean up headers.

gmarull commented 10 months ago

Is anyone not on-board with this? Can we incorporate this into coding guidelines prior to LTSv3? I guess there were some treewide changes - what is the progress? Is there anything left to do?

@gmarull - if there are problematic areas, feel free to delegate tasks for others to clean up headers.

I think it's a generalized problem, I just try to improve the state of headers when I refactor or add new code. Kernel is specially messy, or the infamous soc.h. I do not have much energy to spend time on this right now.

dkalowsk commented 10 months ago

IWYU is a great tool for automating this kind of work. It does take some leverage to get setup. Having used it a little now, I'd also recommend starting a bit "lighter" with something like https://clang.llvm.org/extra/clang-tidy/checks/misc/include-cleaner.html as I believe there is already some clang-tidy work going on.

marc-hb commented 2 weeks ago

Here's a good horror story if you need one: