Introduction

IEC 61508 safety certification is planned for the Zepyhr OS core. Zephyr OS as a eco system is too big to be certified as a whole. The effort would be too high to finish in reasonalbe time. Therefore a safety scope was defined in the past (Zephyr-Overview-2022Q1-Public.pdf ) and will be re-evaluated with the currently running safety certification process in the Safety Committee.

In order to clearly distinguish between non-safe (the far bigger part) and the safe code, we need a mechansim to cluster both, resp. separate / mark at least the safe code parts.

The mechanism may or should serve other such puposes (e.g. security, bluetooth, Profinet etc.).

Please find the closed PR with the with the initial discussion here: #47029.

Problem description

Certified code (also in tools) needs special attention concering modifications. Typically the certification is lost and requires re-certification, which normally implies:

Updating specification: requirements specifications and test specifications
Re-testing: in the simplest case just running a existing test suite again, for safety update and run self written test cases
Updating documentation: manuals
Updating certification inputs: misra reports, cve reports, coverage reports, test reports
Re-qualification of tools: a bit of all the points above but in a smaller extent

Also how modifications are done, at least concering safety, may be critical (e.g. using dynamic memory allocation is typically avoided in safe code) and render a module nearly impossilbe for re-certification.

Proposed change

All files being in a dedicated scope (e.g. safety scope) which will undergo any kind of certification shall be marked with a SPDX file comment tag.

SPDX-FileComment: \<cert-scope1>[, \<cert-scope2>]

For the safety scope it shall be: "SPDX-FileComment: IEC-61508-SIL3" for Zephyr OS source code "SPDX-FileComment: IEC-61508-T2" for T2 class tools "SPDX-FileComment: IEC-61508-T3" for T3 class tools

That change will be done on 2.7-auditable-branch first, and be merged to main before LTS 3.

Detailed RFC

Motion 1: We use the SPDX-FileComment tag together with a to-be-defined certification scope label to mark source code files of any programming language, belonging to the Zephyr OS source tree, belonging to a tool or the firmware built when that source code belongs to that certification scope.

Motion 2: For the to-be-defined safety scope the certification scope label shall be:

IEC-61508-SIL3
IEC-61508-T2
IEC-61508-T3

Motion 3: The spdx tagging mechanism as described above shall be implemented and tested on 2.7-auditable first before finally being merged to main. The resulting PRs will again be subject to approvals.

Proposed change (Detailed)

See Proposed change plus:

Third-party such as HAL will currently not be included in the safety scope primarily due to its extent. The HAL providers would have to also provide the certifcation for their HALs or the Zepyhr OS integrator would have to achieve the certification for a particular HAL.

Depending on a tag a file has, the respective lead would have to be involved in planned changes (added as reviewer in a PR or similar). How such a process could look like, is to be defined.

Dependencies

Dependency to peoples roles: From @gregshue out of #47029:

As an integrator, it most clearly indicates what level of certification each file is intended to support.
As a contributor, it most clearly indicates the level of rigor that I must apply to this file.
As a downstream module developer, it helps me distinguish which tools and processes I may reuse for my own modules and executables.
As a participant in the Security WG, it establishes a reasonable pattern for indicating the set of security standards to be achieved. This is particularly important when looking to integrate multiple downstream modules where each was designed to meet a different security standard.

Concerns and Unresolved Questions

How to handle third-party code if a certification scope expands to also include third-party (HALs)?
west spdx does not yet include the headers used in a binary (tested by Anas). It seams only the command_comile.json listed files are currently used
- I (romkell) consider that a short coming: If a header by chance is GPL licensed, that would mean that I have to opensource all the code im my binary and can not protect my IP, if I need to.
- Fix: most compiler know a --dependencies, -MD, -FD -depend or similar preprocessor option, which outputs the implementation files dependencies etc. typically in a makefile format. Those files can be generated either during a ordinary build or by re-invoking the compiler by west spdx to generate them. The dependency files then can be used as input to generate an input file list to west spdx.
- zephyr.lst, unluckily, does not provide such information.
The tags meaning need to be clearly defined, a "safety" chapter should be started on 2.7-auditable and successively be merged (PRs) to main.

Alternatives

Process wise
- Repeat SPDX tagging on every new \<x>.\<x>-auditable-branch as difference from the last \<y>.\<y>-auditable-tag
  - pros:
  - no impact (dependencies) to "main" developers
  - cons:
  - no control over certification in-conform changes
  - possible changes in a way, which will prevent certification
Technical wise
- use a "database" e.g. yaml file, to mark the file outside of the file
  - pros:
  - easy to tag any file anywhere without modifiying it
  - cons:
  - some new tooling required similar to SPDX
  - outdate faster or "harder" to maintain, since in different location
  - no awareness of the certification status to people dealing with the file

Thoughts

Safety and innovation are somewhat contradicting, unless one is willing to take all the specification, testing etc. effort in its full extent over and over again.
Safety code should be mature and not be fundamentaly changed anymore.
Safety can only work if all parts involved (Zephyr OS firmware sources and tools) fulfill the cert criterias. Sparing out some of it, will not work.
For safety a "do's and dont's" or best practice can be written in a safety section of the docu.

Annex

Safety application development implications in short (and simplified)

This section provides a very brief overview of the implications of developing a (SIL3) safety application according to IEC 61508.

IEC61508 defines four safety integrity levels: SIL1, SIL2, SIL3 and SIL4
- Safe applications may fail, but in a safe manner (at the cost of availability)
- The higher the SIL number the lower the probability of a uncontrolled dangerous failure is accepted [1/h]
- SIL1: ≥ 10 E–6 to < 10 E–5
- SIL2: ≥ 10 E–7 to < 10 E–6
- SIL3: ≥ 10 E–8 to < 10 E–7
- SIL4: ≥ 10 E–9 to < 10 E–8
That applies to hardware and software, for the software aspect:
All code needs to be written against a specification on typically three levels
- System / software requirements specification (SysRS or SRS)
- Software architecture and interface specification (SAIS)
- Software module design specification (SMDS)
All test code needs to be written against a test specification on typically three levels
- System / software test specification (STS) - securing that the SRS was implemented
- Software integration test specification (SITS) - securing that the SAIS was implemented
- Software module test specification (SMTS) - securing that the SMDS was implemented
All code involved needs to be tested with
- System/Software Tests - showing that the System / software specification is fulfilled - not more, not less
- Integration Tests - showing that the Software integration specification is fulfilled
- Module (Unit) Tests - showing that the Software module specification is fulfilled
- that 100% entry point coverage was reached (SIL1, 2, 3, 4)
- that 100% statement coverage was reached (SIL2, 3, 4)
- that 100% branch coverage was reached (SIL3, 4)
- that 100% condition (MC/DC) coverage was reached (SIL4)
- (exceptions to be justified, e.g. covered by code review as substitution)
Traceability (linkage) must be given from the top level specifications down into (test) code: V-Model
- SRS <------------- verifies ------------- STS
  - SAIS <-------- verifies --------- SITS
    - SMDS <--- verifies --- SMTS
    - ----- Code -----
- SRS <-- refines -- SAIS <-- refines -- SMDS
- STS <-- refines -- SITS <-- refines -- SMTS
- STS <-- implements -- (Software test code)
- SITS <-- implements -- (Software integration test code)
- SMTS <-- implements -- (Software Module test code)
Tools used must be qualified for safety development
- Three categories exist as defined by ISO 61508-4 3.2.11: T1, T2 and T3
  - T1: generates no outputs which can directly or indirectly contribute to the executable code (e.g. text editor, design describing tool, ...) - No qualification required
  - T2: errors in the tool can fail to reveal defects but cannot directly create errors in the executable software - Qualification required (e.g. test generator, test coverage tool, static analysis tool, ...)
  - T3: can directly or indirectly contribute to the executable code - Qualification required (e.g. code generator, optimizing compiler, linker, ...)
Non-safety applications running on the same system may not influence the safety application in any way - evidence must be given that this is the case; safe and non-safe applications on the same MCU without MPU/MMU are basically impossible and even with, had to prove that the MPU/MMU guarantees a fail-safe reaction!
Therefore and as consequence of the above, mixing safe and non-safe code, in a eco system / framework where both is present, must be prevented - evidence must be given
Zephyr OS as a framework can, strictly spoken, not be certified, it is the applications (HW and SW) build on and using Zephyr OS that will be certified.
Zephyr OS can be, lets call it pre-certified, so that someone who builds a safe application using Zephyr OS does not have to prove safety fitness of source code and tools anymore but can treat it similar to a safety certified library with all the papers (safety collaterals) available.

"Why tagging files?" or "The Use Cases"

The project aims to certify the core part of the code as IEC 61508 SIL3
Having a clear indication in the code base what is safety relevant and what not
- provides freedom to the developer on the non-safety side and
- creates awareness to the developer on the safety side at the beginning of a development not on PR review (avoiding hard to impossible to certify implementations)
- creates the security for the certifying assessor/body (TÜV/exida) that what is non-safe and safe is distinguishable
Allows to create a (tool supported) process to involve the safety architect early.
Allows to create automated configurations for tools treating the safety and non-safety side differently (e.g. static code analyzer)
Enables a safety user using the spdx tooling to verify and prove to TÜV/exida that only safety certified code was used
Marking the source files will very likely outdate less than a "external database" e.g. yaml file, since it is part of the PR review process, updating an external database gets forgotten easily

Tag handing "Auditable branches" vs "main"

I see two or three options how to work

Do it on both immediately
1. Advantages
  1. Have the opportunity to handle most use cases as above immediately
  2. Avoid safety contradicting developments in "main"
  3. Have "main" closer to "safety ready" on tagging next LTS and auditable
2. Disadvantages
  1. Safety Committee will have to take care about both
  2. "main" developer will get confronted with safety topics
Only do safety on "Auditable Branches" and never on "main"
1. Advantages
  1. Safety Committee can exclusively focus on auditables
  2. "main" developer do not need to care about safety topics
2. Disadvantages
  1. Over the "main" releases, there is a growing gap between the last and the next "auditable" branch - this gap will have to be closed (cleaned up) every time again
  2. There is hardly any control over "safety un-friendly" implementation in the safety scope
Do it on "1. Auditable Branch" first, later, when finished forward merge to "Main"
1. Advantages and Disadvantages are at first those of 2. and later of 1.

SPDX-FileComment: [, ]

Putting multiple identifiers on the same file line has already shown to be problematic for enumerations. We should use a separate comment line per cert-scope.

Safety and innovation are somewhat contradicting

I think you mean "orthogonal" rather than "contradicting". Innovation is largely about concept. Safety (and security) are largely about process.

Safety code should be mature and not be fundamentaly changed anymore.

Verified that it fully meets the requirements and does nothing else - absolutely. Not be recently fundamentally changed - impractical as change will happen.

How to handle third-party code if a certification scope expands to also include third-party (HALs)?

Not "if" but "when". We really need a common strategy and mechanism that can also identify files needing to meet security cert compliance (e.g., IEC-62443-SL2).

A couple of top-level concerns/questions.

Who will be responsible for determining the "to-be-defined safety scope" in motion 2?
What constitutes the T2 and T3 class tools in this scope? (I see this partially addressed in the now-closed PR, would be good to see this copy/pasted into the issue for the record, since we're voting on the issue in the TSC, not the PR)

Finally:

I know you're asking us not to consider the particular files, but you also seem convinced that the devicetree scripts should be in the safety scope.

I want to ask whether you consider that a library like pyelftools, which we rely on in a variety of scripts that are just as critical as the devicetree package (gen_kobject_list.py, gen_kobject_placeholders.py, gen_handles.py, ..., and the list goes on), should be part of the safety scope.

If yes, how do you plan on certifying it?

If no, why should the devicetree scripts get treated any differently?

SPDX-FileComment: [, ]

Putting multiple identifiers on the same file line has already shown to be problematic for enumerations. We should use a separate comment line per cert-scope.

I am fine with any format, as long as its is suitable and conformant to SPDX and west spdx (which I would assume it is, since also licenses are listed in separate lines).

Safety and innovation are somewhat contradicting

I think you mean "orthogonal" rather than "contradicting". Innovation is largely about concept. Safety (and security) are largely about process.

About processes that request output which have to be created with typically a lot more effort that a lean change. In my experience companies tend to re-think changing concepts fundamentally twice, when the price tag is double or more.

Safety code should be mature and not be fundamentally changed anymore.

Verified that it fully meets the requirements and does nothing else - absolutely. Not be recently fundamentally changed - impractical as change will happen.

The gist here was, that a planned change should be a conscious step taking into account the consequences of any kind of certification effort. It was not meant absolute, otherwise we could stop working on any certified code.

How to handle third-party code if a certification scope expands to also include third-party (HALs)?

Not "if" but "when". We really need a common strategy and mechanism that can also identify files needing to meet security cert compliance (e.g., IEC-62443-SL2).

"When" is fine. Nevertheless it leads back to the discussion in the PR if the mechanism is suitable also to cover non-Zephyr code.

A couple of top-level concerns/questions.
1. Who will be responsible for determining the "to-be-defined safety scope" in motion 2?
I guess it would be the respective Committee, for safety the Safety Committee. The maintainers participation is highly appreciated, I guess even needed, since (almost) no one from the Safety Committee knows the implementation details in all the code / tool code and would be able identify the relevant code part / files without spending weeks.

But in the end there is not a lot room for bargaining - if a tools generates code, modifies binaries etc. of a safe application, it is 61508-T3. If it is used for testing in any way (static code analysis, unit test frameworks etc.) it is 61508-T2

2. What constitutes the T2 and T3 class tools in this scope? (I see this partially addressed in the now-closed PR, would be good to see this copy/pasted into the issue for the record, since we're voting on the issue in the TSC, not the PR)
I will copy it from the PR.

Finally:

I know you're asking us not to consider the particular files, but you also seem convinced that the devicetree scripts should be in the safety scope.

Since it creates C structures present in the binary, yes I do not see a way around.

I want to ask whether you consider that a library like pyelftools, which we rely on in a variety of scripts that are just as critical as the devicetree package (gen_kobject_list.py, gen_kobject_placeholders.py, gen_handles.py, ..., and the list goes on), should be part of the safety scope.

If "pyelftools" participates in "code generation" in any way, it is in.

If yes, how do you plan on certifying it?

I respectively we (the Safety Committee) does not yet have a plan yet for all the tools used. The advantage of an external tools I see is, that we use a fixed version (hopefully not changing all the time, since it is a mature tool - otherwise we have to re-qualify each new version). Typically you prepare some tests around your use cases, which prove that the tools is fit for the purpose. And then do the paper work around those tests (docu). Nicole Pappler ( our safety manager) or @simhein would have to give more details. @evgeniy-paltsev was preparing the tools list, where we have to indicate the tools purpose and criticality - this is work in progress.

If no, why should the devicetree scripts get treated any differently?

It will not be treated differently.

Btw. Zephyr OS is rather special in the way of generating code and post processing the binary. Not something that makes it easier for safety. For safety application I personally would prefer a simple constellation: Tested and reviewed static source code -> the certified compiler -> the certified linker -> the safe binary A fixed set of tools, at least for one release, ideally for several releases - new tool version, hence re-qualification of the tool (unless bought with certification).

Copied last comment from #47029 to here to continue:

We need to

@gregshue I believe I have seen you begin several sentences this way.

I think it is likely to cause confusion for people when you speak this way. It may falsely give people the impression that you are a maintainer or collaborator in the zephyr project, and that you are speaking in that capacity.

I understood that as "we" "the community" in what ever role.

Concerning out-of-tree code:

@gregshue takes a Zepyhr OS eco system end user position, the same position as Baumer is in. When building a safe application, as a end user I would want that the west SPDX output tells me, that I have used only certified code. Zephyr OS applications without peripherals drivers and a HAL are limited to the host emulation or qemu (did I forget something?). Hence as a end user I need to be able to create safe applications running on a board including HAL. From that perspective and that I would like to get a report out of SPDX that I only used certified artifact, out-of-tree sources would have to be included and to carry that tag too.

But it is not in the control of the project but of the HAL vendor.

As long as this is a Zephyr OS proprietary solution, it will be difficult that it spreads elsewhere. I understood @kestewart that she takes the approach to the SPDX standardization board (what ever the organizations name is) and tries to create an official solution (e.g. SPDX-Certification) instead of (mis-)using the SPDX-FileComment tag for that. But that might take a while to become officially released. But It is a perspective. Maybe @kestewart can comment on that.

Finally, it is the out-of-tree source code providers and maintainers decision, whether or not he uses that mechanism. No way to force them. The more official, the more the likelihood that it will be adopted.

Using it for the inside-tree code is a start. Getting a SPDX report on that for the inside-tree is a start and already helpful to the end user. Get a full coverage report for all files compiled in is deluxe, but might become true in the future.

Concerning commitment:

I feel that there should be some kind of commitment connected. Otherwise it is just useless tags.

As an example for safety
1. developers /maintainers take special care that changes are done in a manner, that do not hinder the respective re-certification.
2. If developers / maintainers, do not know what the implications are, that advice is seeked from a dedicated role (safety architect, security architect, some committee etc.) or/and documentation (safety do's and dont's)
3. That the roles listed above are invited to review and are given the authority to ask for a changed implementation (once the kernel is full of mallocs and frees where you can hardly prove that there is no constellation where you run out of heap, safety certification is history)
4. That certain to be defined special processes are followed for that code (as indicated above with reviewer invitation)

But it is not in the control of the project but of the HAL vendor.

This actually depends on where the HAL lives. If the HAL is provided through the zephyrproject-rtos then it has an identified module owner who is:

"required to submit the corresponding changes that are required in module repositories, to ensure that the Zephyr CI on th e pull request with the original changes, as well as the module integration testing are successful"
"required to fix issues in the module's codebase that have not been caught by Zephyr pull request CI runs" So, the Zephyr Project does have some control over each repository provided through zephyrproject-rtos, and the identified module owner has the freedom (and responsibility) to update relevant files with the appropriate safety markings.

I agree that there is no way to force the usage in modules provided outside of zephyrproject-rtos.

A couple of top-level concerns/questions.

Who will be responsible for determining the "to-be-defined safety scope" in motion 2?

What constitutes the T2 and T3 class tools in this scope? (I see this partially addressed in the now-closed PR, would be good to see this copy/pasted into the issue for the record, since we're voting on the issue in the TSC, not the PR)

It is worth mentioning that whether this method is picked (SPDX) or some other method (A spreadsheet for example) the scope will still be defined and tools will be part of the certification/qualification process. So the method picked does not have any impact on the scope itself.

I agree that there is no way to force the usage in modules provided outside of zephyrproject-rtos.

There is also no way to force modules within the zephyr project to follow this. The key point is that we will not engage in certifying any third party code and code we do not own or control and I do not understand why we will do this. Actually, if someone decided to do so on their own and introduce this type of tagging into their HAL or module, it will just be confusing and you might end up getting code in a binary that marked for safety that is actually not covered by what the project is trying to do.

Thats where something like SPDX-FileComment: <cert-scope1>[, <cert-scope2>] might not be enough, it will need to have identifier of the scope, i.e. SPDX-FileComment: zephyr:<cert-scope1>[, <cert-scope2>] to make sure things do not get mixed up.

Not "if" but "when". We really need a common strategy and mechanism that can also identify files needing to meet security cert compliance (e.g., IEC-62443-SL2).

You keep bringing this but I have not heard anything, not from the security WG nor from the Security Architect that this is planned, being discussed or needed. Last time I checked, @ceolin is the authority from the security WG/Committee and he should be putting this on the table and making the requirements on strategies and mechanisms for any security certification, which again, I am not aware of.

The Security WG (which I am participating in) is not ready to discuss security standards yet. The agenda for the next meeting is to review a Requirements Syntax training deck. I expect we will begin to discuss OT/IoT security standards and certifications in the next few meetings.

Regarding need: files to be kept compliant with any standard(s) need to be marked with each a tag for each standard. For standards like IEC-62443 the tag needs to indicate the security level also.

More generally, an end user may reasonably need to identify files in their own workspaces with their own tags for similar purposes. Limiting the tag detection tool and build hooks to only look for Zephyr-defined values would be inconsistent with the user need for extensibility. Once the system can support a user-defined tag value it is trivial to add another Zephyr-defined value whenever a new tag is identified.

The Security WG (which I am participating in) is not ready to discuss security standards yet. The agenda for the next meeting is to review a Requirements Syntax training deck. I expect we will begin to discuss OT/IoT security standards and certifications in the next few meetings.

Regarding need: files to be kept compliant with any standard(s) need to be marked with each a tag for each standard. For standards like IEC-62443 the tag needs to indicate the security level also.

More generally, an end user may reasonably need to identify files in their own workspaces with their own tags for similar purposes. Limiting the tag detection tool and build hooks to only look for Zephyr-defined values would be inconsistent with the user need for extensibility. Once the system can support a user-defined tag value it is trivial to add another Zephyr-defined value whenever a new tag is identified.
A couple of top-level concerns/questions.
1. Who will be responsible for determining the "to-be-defined safety scope" in motion 2?
I guess it would be the respective Committee, for safety the Safety Committee. The maintainers participation is highly appreciated, I guess even needed, since (almost) no one from the Safety Committee knows the implementation details in all the code / tool code and would be able identify the relevant code part / files without spending weeks.

But in the end there is not a lot room for bargaining - if a tools generates code, modifies binaries etc. of a safe application, it is 61508-T3. If it is used for testing in any way (static code analysis, unit test frameworks etc.) it is 61508-T2
2. What constitutes the T2 and T3 class tools in this scope? (I see this partially addressed in the now-closed PR, would be good to see this copy/pasted into the issue for the record, since we're voting on the issue in the TSC, not the PR)
I will copy it from the PR.

Finally: I know you're asking us not to consider the particular files, but you also seem convinced that the devicetree scripts should be in the safety scope.

Since it creates C structures present in the binary, yes I do not see a way around.

I want to ask whether you consider that a library like pyelftools, which we rely on in a variety of scripts that are just as critical as the devicetree package (gen_kobject_list.py, gen_kobject_placeholders.py, gen_handles.py, ..., and the list goes on), should be part of the safety scope.

If "pyelftools" participates in "code generation" in any way, it is in.

If yes, how do you plan on certifying it?

Ok, my two cents. Even if it is generating code, it is still part of a tool. We should consider it a tool, otherwise one could use the same argument for the python interpreter or other libraries. Of course, the generated code should be tagged and go over all certification process, but all these scripts should be considered tools (classified and possible qualified).

I respectively we (the Safety Committee) does not yet have a plan yet for all the tools used. The advantage of an external tools I see is, that we use a fixed version (hopefully not changing all the time, since it is a mature tool - otherwise we have to re-qualify each new version). Typically you prepare some tests around your use cases, which prove that the tools is fit for the purpose. And then do the paper work around those tests (docu). Nicole Pappler ( our safety manager) or @simhein would have to give more details. @evgeniy-paltsev was preparing the tools list, where we have to indicate the tools purpose and criticality - this is work in progress.

If no, why should the devicetree scripts get treated any differently?

It will not be treated differently.

Btw. Zephyr OS is rather special in the way of generating code and post processing the binary. Not something that makes it easier for safety. For safety application I personally would prefer a simple constellation: Tested and reviewed static source code -> the certified compiler -> the certified linker -> the safe binary A fixed set of tools, at least for one release, ideally for several releases - new tool version, hence re-qualification of the tool (unless bought with certification).

The Security WG (which I am participating in) is not ready to discuss security standards yet. The agenda for the next meeting is to review a Requirements Syntax training deck. I expect we will begin to discuss OT/IoT security standards and certifications in the next few meetings.

It is not about the WG be ready to discuss security standards, simply there is no demand from the project to pursue any certification, so it is definitely not a goal to group at the moment. Regarding the next meeting, it is about how to capture requirements. If you remember, I asked to keep it as simple as possible. I really want the working group focusing in security enhancements and innovations, there are so many things that we know would be good to have, like, security storage, protocol fuzzing, syscall fuzzing, better samples integrating all these bits in the right way, threat models, crypto hw acceleration, .... but we lack human resources to do all these activities so we are talking about how to document (requirements) them for when someone can work in these items.

Now, if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

Regarding need: files to be kept compliant with any standard(s) need to be marked with each a tag for each standard. For standards like IEC-62443 the tag needs to indicate the security level also. So, basically we have to mark all files in a build (consequently in the project) ? And attribute to them a security level ?

How this is manageable in an open source project ? Basically anytime someone change something we should assess and evaluate the risks again ? Even if it is just a high level assessment based on functionality and exposure, it is still almost unbearable.

More generally, an end user may reasonably need to identify files in their own workspaces with their own tags for similar purposes. Limiting the tag detection tool and build hooks to only look for Zephyr-defined values would be inconsistent with the user need for extensibility. Once the system can support a user-defined tag value it is trivial to add another Zephyr-defined value whenever a new tag is identified.

If it is about tool, the tool should be able to get whatever tag you put there and spill tag:files dictionary. But the problem for the project is how to identify them ...

simply there is no demand from the project to pursue any certification,

Of course not. The Zephyr Project repositories cannot get certified. That must be pursued by the end users. Potential end users will evaluate Zephyr Project processes, tools, and code for suitability for their secure products.

I really want the working group focusing in security enhancements and innovations,

I really want the working group to answer what processes and artifacts I can regenerate to support a claim of "secure" code, and to which files and functions that applies.

If you remember, I asked to keep it as simple as possible.

I did, and even posted the training deck early so people could go through it off line.

so we are talking about how to document (requirements) them for when someone can work in these items

Good.

if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

The vision and mission of the project is firmware for safe and secure resource-constrained devices. If the processes needed for that are not interesting to them, then they probably won't be satisfied working in these domains. These processes only need to be applied to the functionality claimed to be secure. At this point I don't know how much of the code base is targeted for that label.

How this is manageable in an open source project ? Basically anytime someone change something we should assess and evaluate the risks again ? Even if it is just a high level assessment based on functionality and exposure, it is still almost unbearable.

Great questions! I see these as the innovative area that the Zephyr Project is chartered to solve. Though it may be burdensome for developers it gives the Zephyr Project great value to end users.

If it is about tool, the tool should be able to get whatever tag you put there and spill tag:files dictionary. But the problem for the project is how to identify them ...

The charter refers to functionality being brought into the auditable branch, so it starts with identifying functionality and then identifying all source associated with each distinct piece identified.

if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

Several events have happened in the world over the last 3-4 years that are fundamentally changing how general software development needs to be done. Security events and new regulations are driving many domains toward secure software. Regulations, customer demand, and competition are driving software developers toward standards that require no unused/dead code, defense in depth, and all external input validated. Proving something doesn't exist either requires proving that it can't exist, or knowing everything about the executable. For a non-trivial executable that will demand a thorough process. Software developers in most non-regulated domains will probably need to adjust anyway.

Not sure why this was moved to a new issue but my https://github.com/zephyrproject-rtos/zephyr/pull/47029#issuecomment-1175973912 still stands.

But it is not in the control of the project but of the HAL vendor.

This actually depends on where the HAL lives. If the HAL is provided through the zephyrproject-rtos then it has an identified module owner who is:
* "required to submit the corresponding changes that are required in module repositories, to ensure that the Zephyr CI on th e pull request with the original changes, as well as the module integration testing are successful"

* "required to fix issues in the module's codebase that have not been caught by Zephyr pull request CI runs"
  So, the Zephyr Project does have some control over each repository provided through zephyrproject-rtos, and the identified module owner has the freedom (and responsibility) to update relevant files with the appropriate safety markings.
I agree that there is no way to force the usage in modules provided outside of zephyrproject-rtos.

I quickly looked at the stm32 HAL. Only stm32cube is some 3000+ files. Assuming Zepyhr OS uses 1/3, hence 1k, mainly the low level driver as far as I know. The HALs may be no real downstream repos from a main where you have a fixed linkage and simply update to a tag or what so ever. As for stm32cube there is a maintainer who, when updating to new version, rearranges the folder structure a bit from the original stm32cobe and then updates to the new files. In general I would avoid modifiying third-party code a much as possible (also those SPDX tags), since when you update, you will have to redo your modifications over and over again. And if such SPDX tags should stand for "this is IEC61508-SIL3 ready code" on a tagged x.y-auditable-branch, the Zephyr OS project would have to run the certification of all those HALs and boards. That is, effort-wise, impossible. Staying with the stm32 example, I rather see it the way, that, STM will have to provide certification prove for a particular version of their HAL to the end user (or adopiting such a mechanism as SPDX tagging for their HAL, and for that it needs to be a official open standard at least), or the end user having to do that effort for that particular hardware he uses.

if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

Several events have happened in the world over the last 3-4 years that are fundamentally changing how general software development needs to be done. Security events and new regulations are driving many domains toward secure software. Regulations, customer demand, and competition are driving software developers toward standards that require no unused/dead code, defense in depth, and all external input validated. Proving something doesn't exist either requires proving that it can't exist, or knowing everything about the executable. For a non-trivial executable that will demand a thorough process. Software developers in most non-regulated domains will probably need to adjust anyway.

I can understand that a regulated environment (such as safety and I guess security - which is less my experience - might put off developers. To my experience there is a difference between regulated and non-regulated. The more important I see it, that the safety scope is clearly defined and marked.

Cryptosecurity of IoT devices has been regulated in California since 1/1/2020. Cryptosecurity of consumer wireless devices with an address (e.g., BT) has been regulated, whether or not it contains an IP stack, in Oregon since 1/1/2020. More laws are under way (see https://www.nabto.com/us-and-california-iot-security-laws-guide/).

Even without regulations, customers in application domains being targeted by the Zephyr Project have already adopted IEC 62443 Security for Industrial Automation and Control Systems (see https://press.siemens.com/global/en/pressrelease/siemens-process-control-system-first-product-iec-62443-security-certification, dated 2016). Certificate of compliance can be awarded for a specific device. Separately, ETSI EN 303 645 Cybersecurity Standard for Consumer IoT Devices is a globally acceptable standard including 33 mandatory requirements and 35 recommendations.

The more important I see it, that the safety scope is clearly defined and marked.

I agree that the safety scope is clearly defined and marked, and the security scope is not. As I look at application domains, defining a security scope and level that meets a broad set of customer domains is more important to the success of Zephyr than the safety scope. There are FAR more end users needing a secure solution than a safe one.

if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

Several events have happened in the world over the last 3-4 years that are fundamentally changing how general software development needs to be done. Security events and new regulations are driving many domains toward secure software. Regulations, customer demand, and competition are driving software developers toward standards that require no unused/dead code, defense in depth, and all external input validated. Proving something doesn't exist either requires proving that it can't exist, or knowing everything about the executable. For a non-trivial executable that will demand a thorough process. Software developers in most non-regulated domains will probably need to adjust anyway.

I can understand that a regulated environment (such as safety and I guess security - which is less my domain) might put off developers. To my experience there is a difference between regulated and non-regulated. Processes in non-regulated environments are typically leaner. For that, see it even more important, that the safety scope is clearly defined and marked.

Ok, my two cents. Even if it is generating code, it is still part of a tool. We should consider it a tool, otherwise one could use the same argument for the python interpreter or other libraries. Of course, the generated code should be tagged and go over all certification process, but all these scripts should be considered tools (classified and possible qualified).

Not fully clear to me what the notion of the said is. I do not think that we required to qualify the python interpreter as a tool, but we will have to qualify a version of the code generating script together with a particular version of the python interpreter, showing that the combination of both does what it is meant to. Your comment leads me to a thought that for the source code generating tools, the generated code parts could be treated and tested similar to hand coded files (unit testing, integration testing, software testing), possibly re-leaving the tool from qualification. Best, the test code then should be written flexible enough to cope with the different tool outputs.

It is all options, not yet defined and not yet accepted by TÜV/exida.

At this point we are simply looking for a mechanism to mark scopes.

Not sure why this was moved to a new issue but my #47029 (comment) still stands.

This was decided in the last TSC meeting to get the focus on the mechanism instead of the scope with its 800+ files.

I don't like the idea of statically tagging source files with meta-information that typical developers cannot take decisions on.

A short follow up on your comment @henrikbrixandersen . If we decide on another mechanism to mark the safety scope ( e.g spreadsheet or a yml file ) isn't this also adding meta-information to a file with a detour? And the developers cannot take decisions on that too but the only difference I guess would be the decision is not present in the developers file directly.

Cryptosecurity of IoT devices has been regulated in California since 1/1/2020. Cryptosecurity of consumer wireless devices with an address (e.g., BT) has been regulated, whether or not it contains an IP stack, in Oregon since 1/1/2020. More laws are under way (see https://www.nabto.com/us-and-california-iot-security-laws-guide/).

Even without regulations, customers in application domains being targeted by the Zephyr Project have already adopted IEC 62443 Security for Industrial Automation and Control Systems (see https://press.siemens.com/global/en/pressrelease/siemens-process-control-system-first-product-iec-62443-security-certification, dated 2016). Certificate of compliance can be awarded for a specific device. Separately, ETSI EN 303 645 Cybersecurity Standard for Consumer IoT Devices is a globally acceptable standard including 33 mandatory requirements and 35 recommendations.

Can we all focus on the topic of this RFC please? Which is not the security standard or the focus of the security WG.

The more important I see it, that the safety scope is clearly defined and marked.

I agree that the safety scope is clearly defined and marked, and the security scope is not. As I look at application domains, defining a security scope and level that meets a broad set of customer domains is more important to the success of Zephyr than the safety scope. There are FAR more end users needing a secure solution than a safe one.

I think we all agree on the point that the safety scope needs to be clearly defined, but I would appreciate if we could avoid the Debat on principles of the project whether security or safety is more important. This needs to be discussed in another forum.

Ok, my two cents. Even if it is generating code, it is still part of a tool. We should consider it a tool, otherwise one could use the same argument for the python interpreter or other libraries. Of course, the generated code should be tagged and go over all certification process, but all these scripts should be considered tools (classified and possible qualified).

Not fully clear to me what the notion of the said is. I do not think that we required to qualify the python interpreter as a tool, but we will have to qualify a version of the code generating script together with a particular version of the python interpreter, showing that the combination of both does what it is meant to. Your comment leads me to a thought that for the source code generating tools, the generated code parts could be treated and tested similar to hand coded files (unit testing, integration testing, software testing), possibly re-leaving the tool from qualification. Best, the test code then should be written flexible enough to cope with the different tool outputs.

It is all options, not yet defined and not yet accepted by TÜV/exida.

At this point we are simply looking for a mechanism to mark scopes.

I think the notion here is that scripts shall be treated as own tools instead of code that generates c code and the generated c code needs to be treated as "hand" written code with all the safety aspect and methods in mind. I fully agree with that. If we don't do this someone could come up with the argument "hey you have python code which generates c code and the tool you are using is python" and than the python interpreter would come to the table. ( Is that what you wanted to point out @ceolin ?)

About your thought @romkell It wouldn't release the tool from qualification but it can be used for it to minimize the effort.

But the discussion what needs to be defined as tool and what not is another one and with this mechanism we would have a least a tool ( pun intended ) to do such things.

A short follow up on your comment @henrikbrixandersen . If we decide on another mechanism to mark the safety scope ( e.g spreadsheet or a yml file ) isn't this also adding meta-information to a file with a detour? And the developers cannot take decisions on that too but the only difference I guess would be the decision is not present in the developers file directly.

Sure, at some point - before doing a recertification - the list needs to be evaluated again. But that would be in the scope of the recertification, not in the day-to-day changes done by developers without insights into how to handle the safety scope.

This was decided in the last TSC meeting to get the focus on the mechanism instead of the scope with its 800+ files.

Fair enough, although that's not what I understood from the meeting. I don't believe we can fully decouple the two (scope vs. mechanism).

Say, if this was tagging a few files (e.g. the base kernel files) in the tree with a safety tag, I don't think many people would object. But tagging 800+ files all over the tree makes this relevant for a lot of developers with little or no insight in the processes around safety certification.

A short follow up on your comment @henrikbrixandersen . If we decide on another mechanism to mark the safety scope ( e.g spreadsheet or a yml file ) isn't this also adding meta-information to a file with a detour? And the developers cannot take decisions on that too but the only difference I guess would be the decision is not present in the developers file directly.

Sure, at some point - before doing a recertification - the list needs to be evaluated again. But that would be in the scope of the recertification, not in the day-to-day changes done by developers without insights into how to handle the safety scope.

This is about working together and having mechanisms in the safety scope which support development towards a safety mindset. Is is about defensive programming (do's and dont's with a focus on safety, which I feel is not very far from do's and dont's for security). Measures to achieve that can be:

In the safety scope have a safety architect review code changes
write guidelines how to program for safety (what to do and what to better leave)
train developers concerning safety, if required

Your options with leaving the day-to-day changes done by developers just happen, might end up with a practically in-re-certifiable code base (e.g. mallocs / frees all over the place, loop boundaries re-calculated in the loop, etc.). It is a bit like "lets develop and see if we get it re-certified again in 2 years with the next LTS". A risky strategy.

In the safety scope have a safety architect review code changes

This goes back to the question of the scope for this. Are we talking day-to-day changes of 800+ files requiring review from one dedicated person (the safety architect) - or are we talking 10 files? 20 files?

write guidelines how to program for safety (what to do and what to better leave)

train developers concerning safety, if required

While this sounds easy as a one-line, how would you do this on an open source project such as Zephyr in practice? We have 1000+ developers so far.

Your options with leaving the day-to-day changes done by developers just happen, might end up with a practically in-re-certifiable code base (e.g. mallocs / frees all over the place, loop boundaries re-calculated in the loop, etc.). It is a bit like "lets develop and see if we get it re-certified again in 2 years with the next LTS". A risky strategy.

Now you are just making things up ;-) We don't see "mallocs / frees all over the place" or "loop boundaries re-calculated in the loop" as it is today as these are bad practice, safety scope or not.

In the safety scope have a safety architect review code changes

This goes back to the question of the scope for this. Are we talking day-to-day changes of 800+ files requiring review from one dedicated person (the safety architect) - or are we talking 10 files? 20 files?

At first place we need to distinguish between the tool files and the source code files which where mixed up in the 800+ files for the PR. I'm actually working on refining the scope for the source files and it seems like that have a number around 230 files under scope but I need to double check it.

write guidelines how to program for safety (what to do and what to better leave)

train developers concerning safety, if required

While this sounds easy as a one-line, how would you do this on an open source project such as Zephyr in practice? We have 1000+ developers so far.

Okay as I read your argument you are right, it it is maybe impossible for the project in a whole to introduce such things, but this is more like a thought or a idea what could be maybe done with such a marking mechanism. What also needs to be pointed out if we agree on this mechanism is that it should not happen that the safety or whatever committee/WG will introduce new processes or implementations based on this mechanism without the approval of the TSC/process WG and the community.

Your options with leaving the day-to-day changes done by developers just happen, might end up with a practically in-re-certifiable code base (e.g. mallocs / frees all over the place, loop boundaries re-calculated in the loop, etc.). It is a bit like "lets develop and see if we get it re-certified again in 2 years with the next LTS". A risky strategy.

Now you are just making things up ;-) We don't see "mallocs / frees all over the place" or "loop boundaries re-calculated in the loop" as it is today as these are bad practice, safety scope or not.

Well that is a good thing to hear :) I see the point of @romkell more like from the architectural perspective, when someone wants to introduce for example a new implementation or a design for something very dynamic, it would be possible that the project or committee could act on it like we do now with the normal Architecture WG meeting ( correct me if I'm wrong ).

This was decided in the last TSC meeting to get the focus on the mechanism instead of the scope with its 800+ files.

Agreed. It was also noted by @kestewart that the mechanism is expected to work for identifying the security scope.

Can we all focus on the topic of this RFC please? Which is not the security standard or the focus of the security WG.

Agreed. As I see it, all of the discussion around security regulations and standards pretty clearly shows the scope of the mechanism needs to include all the places where security tags are expected to appear. This brings up a very relevant question:

Does Zephyr Project need to support identifying in the main branch which source is "secure" and to what level?

This is about working together and having mechanisms in the safety scope which support development towards a safety mindset. ...

There are analogs in the security scope that this mechanism is also expected to support.

Have we even created a list of the requirements a mechanism must meet?

Not fully clear to me what the notion of the said is. I do not think that we required to qualify the python interpreter as a tool, but we will have to qualify a version of the code generating script together with a particular version of the python interpreter, showing that the combination of both does what it is meant to. Your comment leads me to a thought that for the source code generating tools, the generated code parts could be treated and tested similar to hand coded files (unit testing, integration testing, software testing), possibly re-leaving the tool from qualification. Best, the test code then should be written flexible enough to cope with the different tool outputs. It is all options, not yet defined and not yet accepted by TÜV/exida. At this point we are simply looking for a mechanism to mark scopes.

I think the notion here is that scripts shall be treated as own tools instead of code that generates c code and the generated c code needs to be treated as "hand" written code with all the safety aspect and methods in mind. I fully agree with that. If we don't do this someone could come up with the argument "hey you have python code which generates c code and the tool you are using is python" and than the python interpreter would come to the table. ( Is that what you wanted to point out @ceolin ?)

Yes !

About your thought @romkell It wouldn't release the tool from qualification but it can be used for it to minimize the effort.

But the discussion what needs to be defined as tool and what not is another one and with this mechanism we would have a least a tool ( pun intended ) to do such things.

From https://github.com/zephyrproject-rtos/zephyr/pull/47029#issuecomment-1170212688

We need to have a mechanism to indicate what is in and what out.

What are other options:

parsing zephyr.lst is fine but to verify it again what?

some kind of database telling what is in the safety scope and what not? Who would maintain that? That database would rot even faster, I guess, because it is detached from the code completely.

Use KConfig #ifdef CONFIG_SAFETY or similar

Mark all safety related functions with / @in_safety_scope / or similar so those information can be copied when splitting files. Quite some micro management, but easier to handle for those moving these things around

Another option would be to leave "main" untouched from such mechanisms and only do and re-do it on each x.y-auditable branch which will be certified.

A few thoughts on the options above:

The purpose of the security discussion was to show the last options may be impractical very soon.
In order to support development of FW images using either a workspace application model or freestanding application model what is in/out of each scope must track and follow each module. A centralized "database" doesn't scale to our current needs.
A per-function or per-file marking is more granular than the "features to move from the long term support (“LTS”) code base to the auditable code base" described in the Zephyr Charter.

It seems "features" are scoped and each activation controlled by Kconfigs. How cleanly are the "features" partitioned into files?

No conclusion on this specific RFC for almost one year, also other options are in consideration in the safety committee/WG. I will close this RFC and it can be reopened if needed.

zephyrproject-rtos / zephyr

RFC: Mark all safety relevant files with a SPDX tag #49829

Introduction

Problem description

Proposed change

Detailed RFC

Proposed change (Detailed)

Dependencies

Concerns and Unresolved Questions

Alternatives

Thoughts

Annex

Safety application development implications in short (and simplified)

"Why tagging files?" or "The Use Cases"

Tag handing "Auditable branches" vs "main"

Copied last comment from #47029 to here to continue: