zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.52k stars 6.45k forks source link

RFC: Mark all safety relevant files with a SPDX tag #49829

Closed romkell closed 1 year ago

romkell commented 2 years ago

Introduction

IEC 61508 safety certification is planned for the Zepyhr OS core. Zephyr OS as a eco system is too big to be certified as a whole. The effort would be too high to finish in reasonalbe time. Therefore a safety scope was defined in the past (Zephyr-Overview-2022Q1-Public.pdf ) and will be re-evaluated with the currently running safety certification process in the Safety Committee.

In order to clearly distinguish between non-safe (the far bigger part) and the safe code, we need a mechansim to cluster both, resp. separate / mark at least the safe code parts.

The mechanism may or should serve other such puposes (e.g. security, bluetooth, Profinet etc.).

Please find the closed PR with the with the initial discussion here: #47029.

Problem description

Certified code (also in tools) needs special attention concering modifications. Typically the certification is lost and requires re-certification, which normally implies:

  1. Updating specification: requirements specifications and test specifications
  2. Re-testing: in the simplest case just running a existing test suite again, for safety update and run self written test cases
  3. Updating documentation: manuals
  4. Updating certification inputs: misra reports, cve reports, coverage reports, test reports
  5. Re-qualification of tools: a bit of all the points above but in a smaller extent

Also how modifications are done, at least concering safety, may be critical (e.g. using dynamic memory allocation is typically avoided in safe code) and render a module nearly impossilbe for re-certification.

Proposed change

All files being in a dedicated scope (e.g. safety scope) which will undergo any kind of certification shall be marked with a SPDX file comment tag.

SPDX-FileComment: \<cert-scope1>[, \<cert-scope2>]

For the safety scope it shall be: "SPDX-FileComment: IEC-61508-SIL3" for Zephyr OS source code "SPDX-FileComment: IEC-61508-T2" for T2 class tools "SPDX-FileComment: IEC-61508-T3" for T3 class tools

That change will be done on 2.7-auditable-branch first, and be merged to main before LTS 3.

Detailed RFC

Motion 1: We use the SPDX-FileComment tag together with a to-be-defined certification scope label to mark source code files of any programming language, belonging to the Zephyr OS source tree, belonging to a tool or the firmware built when that source code belongs to that certification scope.

Motion 2: For the to-be-defined safety scope the certification scope label shall be:

Motion 3: The spdx tagging mechanism as described above shall be implemented and tested on 2.7-auditable first before finally being merged to main. The resulting PRs will again be subject to approvals.

Proposed change (Detailed)

See Proposed change plus:

Third-party such as HAL will currently not be included in the safety scope primarily due to its extent. The HAL providers would have to also provide the certifcation for their HALs or the Zepyhr OS integrator would have to achieve the certification for a particular HAL.

Depending on a tag a file has, the respective lead would have to be involved in planned changes (added as reviewer in a PR or similar). How such a process could look like, is to be defined.

Dependencies

Dependency to peoples roles: From @gregshue out of #47029:

Concerns and Unresolved Questions

Alternatives

Thoughts

Annex

Safety application development implications in short (and simplified)

This section provides a very brief overview of the implications of developing a (SIL3) safety application according to IEC 61508.

"Why tagging files?" or "The Use Cases"

Tag handing "Auditable branches" vs "main"

I see two or three options how to work

  1. Do it on both immediately
    1. Advantages
      1. Have the opportunity to handle most use cases as above immediately
      2. Avoid safety contradicting developments in "main"
      3. Have "main" closer to "safety ready" on tagging next LTS and auditable
    2. Disadvantages
      1. Safety Committee will have to take care about both
      2. "main" developer will get confronted with safety topics
  2. Only do safety on "Auditable Branches" and never on "main"
    1. Advantages
      1. Safety Committee can exclusively focus on auditables
      2. "main" developer do not need to care about safety topics
    2. Disadvantages
      1. Over the "main" releases, there is a growing gap between the last and the next "auditable" branch - this gap will have to be closed (cleaned up) every time again
      2. There is hardly any control over "safety un-friendly" implementation in the safety scope
  3. Do it on "1. Auditable Branch" first, later, when finished forward merge to "Main"
    1. Advantages and Disadvantages are at first those of 2. and later of 1.
gregshue commented 2 years ago

SPDX-FileComment: [, ]

Putting multiple identifiers on the same file line has already shown to be problematic for enumerations. We should use a separate comment line per cert-scope.

Safety and innovation are somewhat contradicting

I think you mean "orthogonal" rather than "contradicting". Innovation is largely about concept. Safety (and security) are largely about process.

Safety code should be mature and not be fundamentaly changed anymore.

Verified that it fully meets the requirements and does nothing else - absolutely. Not be recently fundamentally changed - impractical as change will happen.

How to handle third-party code if a certification scope expands to also include third-party (HALs)?

Not "if" but "when". We really need a common strategy and mechanism that can also identify files needing to meet security cert compliance (e.g., IEC-62443-SL2).

mbolivar-nordic commented 2 years ago

A couple of top-level concerns/questions.

  1. Who will be responsible for determining the "to-be-defined safety scope" in motion 2?
  2. What constitutes the T2 and T3 class tools in this scope? (I see this partially addressed in the now-closed PR, would be good to see this copy/pasted into the issue for the record, since we're voting on the issue in the TSC, not the PR)

Finally:

I know you're asking us not to consider the particular files, but you also seem convinced that the devicetree scripts should be in the safety scope.

I want to ask whether you consider that a library like pyelftools, which we rely on in a variety of scripts that are just as critical as the devicetree package (gen_kobject_list.py, gen_kobject_placeholders.py, gen_handles.py, ..., and the list goes on), should be part of the safety scope.

If yes, how do you plan on certifying it?

If no, why should the devicetree scripts get treated any differently?

romkell commented 2 years ago

SPDX-FileComment: [, ]

Putting multiple identifiers on the same file line has already shown to be problematic for enumerations. We should use a separate comment line per cert-scope.

I am fine with any format, as long as its is suitable and conformant to SPDX and west spdx (which I would assume it is, since also licenses are listed in separate lines).

Safety and innovation are somewhat contradicting

I think you mean "orthogonal" rather than "contradicting". Innovation is largely about concept. Safety (and security) are largely about process.

About processes that request output which have to be created with typically a lot more effort that a lean change. In my experience companies tend to re-think changing concepts fundamentally twice, when the price tag is double or more.

Safety code should be mature and not be fundamentally changed anymore.

Verified that it fully meets the requirements and does nothing else - absolutely. Not be recently fundamentally changed - impractical as change will happen.

The gist here was, that a planned change should be a conscious step taking into account the consequences of any kind of certification effort. It was not meant absolute, otherwise we could stop working on any certified code.

How to handle third-party code if a certification scope expands to also include third-party (HALs)?

Not "if" but "when". We really need a common strategy and mechanism that can also identify files needing to meet security cert compliance (e.g., IEC-62443-SL2).

"When" is fine. Nevertheless it leads back to the discussion in the PR if the mechanism is suitable also to cover non-Zephyr code.

romkell commented 2 years ago

A couple of top-level concerns/questions.

1. Who will be responsible for determining the "to-be-defined safety scope" in motion 2?

I guess it would be the respective Committee, for safety the Safety Committee. The maintainers participation is highly appreciated, I guess even needed, since (almost) no one from the Safety Committee knows the implementation details in all the code / tool code and would be able identify the relevant code part / files without spending weeks.

But in the end there is not a lot room for bargaining - if a tools generates code, modifies binaries etc. of a safe application, it is 61508-T3. If it is used for testing in any way (static code analysis, unit test frameworks etc.) it is 61508-T2

2. What constitutes the T2 and T3 class tools in this scope? (I see this partially addressed in the now-closed PR, would be good to see this copy/pasted into the issue for the record, since we're voting on the issue in the TSC, not the PR)

I will copy it from the PR.

Finally:

I know you're asking us not to consider the particular files, but you also seem convinced that the devicetree scripts should be in the safety scope.

Since it creates C structures present in the binary, yes I do not see a way around.

I want to ask whether you consider that a library like pyelftools, which we rely on in a variety of scripts that are just as critical as the devicetree package (gen_kobject_list.py, gen_kobject_placeholders.py, gen_handles.py, ..., and the list goes on), should be part of the safety scope.

If "pyelftools" participates in "code generation" in any way, it is in.

If yes, how do you plan on certifying it?

I respectively we (the Safety Committee) does not yet have a plan yet for all the tools used. The advantage of an external tools I see is, that we use a fixed version (hopefully not changing all the time, since it is a mature tool - otherwise we have to re-qualify each new version). Typically you prepare some tests around your use cases, which prove that the tools is fit for the purpose. And then do the paper work around those tests (docu). Nicole Pappler ( our safety manager) or @simhein would have to give more details. @evgeniy-paltsev was preparing the tools list, where we have to indicate the tools purpose and criticality - this is work in progress.

If no, why should the devicetree scripts get treated any differently?

It will not be treated differently.

Btw. Zephyr OS is rather special in the way of generating code and post processing the binary. Not something that makes it easier for safety. For safety application I personally would prefer a simple constellation: Tested and reviewed static source code -> the certified compiler -> the certified linker -> the safe binary A fixed set of tools, at least for one release, ideally for several releases - new tool version, hence re-qualification of the tool (unless bought with certification).

romkell commented 2 years ago

Copied last comment from #47029 to here to continue:

We need to

@gregshue I believe I have seen you begin several sentences this way.

I think it is likely to cause confusion for people when you speak this way. It may falsely give people the impression that you are a maintainer or collaborator in the zephyr project, and that you are speaking in that capacity.

I understood that as "we" "the community" in what ever role.

Concerning out-of-tree code:

@gregshue takes a Zepyhr OS eco system end user position, the same position as Baumer is in. When building a safe application, as a end user I would want that the west SPDX output tells me, that I have used only certified code. Zephyr OS applications without peripherals drivers and a HAL are limited to the host emulation or qemu (did I forget something?). Hence as a end user I need to be able to create safe applications running on a board including HAL. From that perspective and that I would like to get a report out of SPDX that I only used certified artifact, out-of-tree sources would have to be included and to carry that tag too.

But it is not in the control of the project but of the HAL vendor.

As long as this is a Zephyr OS proprietary solution, it will be difficult that it spreads elsewhere. I understood @kestewart that she takes the approach to the SPDX standardization board (what ever the organizations name is) and tries to create an official solution (e.g. SPDX-Certification) instead of (mis-)using the SPDX-FileComment tag for that. But that might take a while to become officially released. But It is a perspective. Maybe @kestewart can comment on that.

Finally, it is the out-of-tree source code providers and maintainers decision, whether or not he uses that mechanism. No way to force them. The more official, the more the likelihood that it will be adopted.

Using it for the inside-tree code is a start. Getting a SPDX report on that for the inside-tree is a start and already helpful to the end user. Get a full coverage report for all files compiled in is deluxe, but might become true in the future.

Concerning commitment:

I feel that there should be some kind of commitment connected. Otherwise it is just useless tags.

gregshue commented 2 years ago

But it is not in the control of the project but of the HAL vendor.

This actually depends on where the HAL lives. If the HAL is provided through the zephyrproject-rtos then it has an identified module owner who is:

I agree that there is no way to force the usage in modules provided outside of zephyrproject-rtos.

nashif commented 2 years ago

A couple of top-level concerns/questions.

  1. Who will be responsible for determining the "to-be-defined safety scope" in motion 2?
  2. What constitutes the T2 and T3 class tools in this scope? (I see this partially addressed in the now-closed PR, would be good to see this copy/pasted into the issue for the record, since we're voting on the issue in the TSC, not the PR)

It is worth mentioning that whether this method is picked (SPDX) or some other method (A spreadsheet for example) the scope will still be defined and tools will be part of the certification/qualification process. So the method picked does not have any impact on the scope itself.

nashif commented 2 years ago

I agree that there is no way to force the usage in modules provided outside of zephyrproject-rtos.

There is also no way to force modules within the zephyr project to follow this. The key point is that we will not engage in certifying any third party code and code we do not own or control and I do not understand why we will do this. Actually, if someone decided to do so on their own and introduce this type of tagging into their HAL or module, it will just be confusing and you might end up getting code in a binary that marked for safety that is actually not covered by what the project is trying to do.

Thats where something like SPDX-FileComment: <cert-scope1>[, <cert-scope2>] might not be enough, it will need to have identifier of the scope, i.e. SPDX-FileComment: zephyr:<cert-scope1>[, <cert-scope2>] to make sure things do not get mixed up.

nashif commented 2 years ago

Not "if" but "when". We really need a common strategy and mechanism that can also identify files needing to meet security cert compliance (e.g., IEC-62443-SL2).

You keep bringing this but I have not heard anything, not from the security WG nor from the Security Architect that this is planned, being discussed or needed. Last time I checked, @ceolin is the authority from the security WG/Committee and he should be putting this on the table and making the requirements on strategies and mechanisms for any security certification, which again, I am not aware of.

gregshue commented 2 years ago

The Security WG (which I am participating in) is not ready to discuss security standards yet. The agenda for the next meeting is to review a Requirements Syntax training deck. I expect we will begin to discuss OT/IoT security standards and certifications in the next few meetings.

Regarding need: files to be kept compliant with any standard(s) need to be marked with each a tag for each standard. For standards like IEC-62443 the tag needs to indicate the security level also.

More generally, an end user may reasonably need to identify files in their own workspaces with their own tags for similar purposes. Limiting the tag detection tool and build hooks to only look for Zephyr-defined values would be inconsistent with the user need for extensibility. Once the system can support a user-defined tag value it is trivial to add another Zephyr-defined value whenever a new tag is identified.

ceolin commented 2 years ago

The Security WG (which I am participating in) is not ready to discuss security standards yet. The agenda for the next meeting is to review a Requirements Syntax training deck. I expect we will begin to discuss OT/IoT security standards and certifications in the next few meetings.

Regarding need: files to be kept compliant with any standard(s) need to be marked with each a tag for each standard. For standards like IEC-62443 the tag needs to indicate the security level also.

More generally, an end user may reasonably need to identify files in their own workspaces with their own tags for similar purposes. Limiting the tag detection tool and build hooks to only look for Zephyr-defined values would be inconsistent with the user need for extensibility. Once the system can support a user-defined tag value it is trivial to add another Zephyr-defined value whenever a new tag is identified.

A couple of top-level concerns/questions.

1. Who will be responsible for determining the "to-be-defined safety scope" in motion 2?

I guess it would be the respective Committee, for safety the Safety Committee. The maintainers participation is highly appreciated, I guess even needed, since (almost) no one from the Safety Committee knows the implementation details in all the code / tool code and would be able identify the relevant code part / files without spending weeks.

But in the end there is not a lot room for bargaining - if a tools generates code, modifies binaries etc. of a safe application, it is 61508-T3. If it is used for testing in any way (static code analysis, unit test frameworks etc.) it is 61508-T2

2. What constitutes the T2 and T3 class tools in this scope? (I see this partially addressed in the now-closed PR, would be good to see this copy/pasted into the issue for the record, since we're voting on the issue in the TSC, not the PR)

I will copy it from the PR.

Finally: I know you're asking us not to consider the particular files, but you also seem convinced that the devicetree scripts should be in the safety scope.

Since it creates C structures present in the binary, yes I do not see a way around.

I want to ask whether you consider that a library like pyelftools, which we rely on in a variety of scripts that are just as critical as the devicetree package (gen_kobject_list.py, gen_kobject_placeholders.py, gen_handles.py, ..., and the list goes on), should be part of the safety scope.

If "pyelftools" participates in "code generation" in any way, it is in.

If yes, how do you plan on certifying it?

Ok, my two cents. Even if it is generating code, it is still part of a tool. We should consider it a tool, otherwise one could use the same argument for the python interpreter or other libraries. Of course, the generated code should be tagged and go over all certification process, but all these scripts should be considered tools (classified and possible qualified).

I respectively we (the Safety Committee) does not yet have a plan yet for all the tools used. The advantage of an external tools I see is, that we use a fixed version (hopefully not changing all the time, since it is a mature tool - otherwise we have to re-qualify each new version). Typically you prepare some tests around your use cases, which prove that the tools is fit for the purpose. And then do the paper work around those tests (docu). Nicole Pappler ( our safety manager) or @simhein would have to give more details. @evgeniy-paltsev was preparing the tools list, where we have to indicate the tools purpose and criticality - this is work in progress.

If no, why should the devicetree scripts get treated any differently?

It will not be treated differently.

Btw. Zephyr OS is rather special in the way of generating code and post processing the binary. Not something that makes it easier for safety. For safety application I personally would prefer a simple constellation: Tested and reviewed static source code -> the certified compiler -> the certified linker -> the safe binary A fixed set of tools, at least for one release, ideally for several releases - new tool version, hence re-qualification of the tool (unless bought with certification).

ceolin commented 2 years ago

The Security WG (which I am participating in) is not ready to discuss security standards yet. The agenda for the next meeting is to review a Requirements Syntax training deck. I expect we will begin to discuss OT/IoT security standards and certifications in the next few meetings.

It is not about the WG be ready to discuss security standards, simply there is no demand from the project to pursue any certification, so it is definitely not a goal to group at the moment. Regarding the next meeting, it is about how to capture requirements. If you remember, I asked to keep it as simple as possible. I really want the working group focusing in security enhancements and innovations, there are so many things that we know would be good to have, like, security storage, protocol fuzzing, syscall fuzzing, better samples integrating all these bits in the right way, threat models, crypto hw acceleration, .... but we lack human resources to do all these activities so we are talking about how to document (requirements) them for when someone can work in these items.

Now, if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

Regarding need: files to be kept compliant with any standard(s) need to be marked with each a tag for each standard. For standards like IEC-62443 the tag needs to indicate the security level also. So, basically we have to mark all files in a build (consequently in the project) ? And attribute to them a security level ?

How this is manageable in an open source project ? Basically anytime someone change something we should assess and evaluate the risks again ? Even if it is just a high level assessment based on functionality and exposure, it is still almost unbearable.

More generally, an end user may reasonably need to identify files in their own workspaces with their own tags for similar purposes. Limiting the tag detection tool and build hooks to only look for Zephyr-defined values would be inconsistent with the user need for extensibility. Once the system can support a user-defined tag value it is trivial to add another Zephyr-defined value whenever a new tag is identified.

If it is about tool, the tool should be able to get whatever tag you put there and spill tag:files dictionary. But the problem for the project is how to identify them ...

gregshue commented 2 years ago

simply there is no demand from the project to pursue any certification,

Of course not. The Zephyr Project repositories cannot get certified. That must be pursued by the end users. Potential end users will evaluate Zephyr Project processes, tools, and code for suitability for their secure products.

I really want the working group focusing in security enhancements and innovations,

I really want the working group to answer what processes and artifacts I can regenerate to support a claim of "secure" code, and to which files and functions that applies.

If you remember, I asked to keep it as simple as possible.

I did, and even posted the training deck early so people could go through it off line.

so we are talking about how to document (requirements) them for when someone can work in these items

Good.

if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

The vision and mission of the project is firmware for safe and secure resource-constrained devices. If the processes needed for that are not interesting to them, then they probably won't be satisfied working in these domains. These processes only need to be applied to the functionality claimed to be secure. At this point I don't know how much of the code base is targeted for that label.

How this is manageable in an open source project ? Basically anytime someone change something we should assess and evaluate the risks again ? Even if it is just a high level assessment based on functionality and exposure, it is still almost unbearable.

Great questions! I see these as the innovative area that the Zephyr Project is chartered to solve. Though it may be burdensome for developers it gives the Zephyr Project great value to end users.

If it is about tool, the tool should be able to get whatever tag you put there and spill tag:files dictionary. But the problem for the project is how to identify them ...

The charter refers to functionality being brought into the auditable branch, so it starts with identifying functionality and then identifying all source associated with each distinct piece identified.

gregshue commented 2 years ago

if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

Several events have happened in the world over the last 3-4 years that are fundamentally changing how general software development needs to be done. Security events and new regulations are driving many domains toward secure software. Regulations, customer demand, and competition are driving software developers toward standards that require no unused/dead code, defense in depth, and all external input validated. Proving something doesn't exist either requires proving that it can't exist, or knowing everything about the executable. For a non-trivial executable that will demand a thorough process. Software developers in most non-regulated domains will probably need to adjust anyway.

henrikbrixandersen commented 2 years ago

Not sure why this was moved to a new issue but my https://github.com/zephyrproject-rtos/zephyr/pull/47029#issuecomment-1175973912 still stands.

romkell commented 2 years ago

But it is not in the control of the project but of the HAL vendor.

This actually depends on where the HAL lives. If the HAL is provided through the zephyrproject-rtos then it has an identified module owner who is:

* "required to submit the corresponding changes that are required in module repositories, to ensure that the Zephyr CI on th e pull request with the original changes, as well as the module integration testing are successful"

* "required to fix issues in the module's codebase that have not been caught by Zephyr pull request CI runs"
  So, the Zephyr Project does have some control over each repository provided through zephyrproject-rtos, and the identified module owner has the freedom (and responsibility) to update relevant files with the appropriate safety markings.

I agree that there is no way to force the usage in modules provided outside of zephyrproject-rtos.

I quickly looked at the stm32 HAL. Only stm32cube is some 3000+ files. Assuming Zepyhr OS uses 1/3, hence 1k, mainly the low level driver as far as I know. The HALs may be no real downstream repos from a main where you have a fixed linkage and simply update to a tag or what so ever. As for stm32cube there is a maintainer who, when updating to new version, rearranges the folder structure a bit from the original stm32cobe and then updates to the new files. In general I would avoid modifiying third-party code a much as possible (also those SPDX tags), since when you update, you will have to redo your modifications over and over again. And if such SPDX tags should stand for "this is IEC61508-SIL3 ready code" on a tagged x.y-auditable-branch, the Zephyr OS project would have to run the certification of all those HALs and boards. That is, effort-wise, impossible. Staying with the stm32 example, I rather see it the way, that, STM will have to provide certification prove for a particular version of their HAL to the end user (or adopiting such a mechanism as SPDX tagging for their HAL, and for that it needs to be a official open standard at least), or the end user having to do that effort for that particular hardware he uses.

romkell commented 2 years ago

if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

Several events have happened in the world over the last 3-4 years that are fundamentally changing how general software development needs to be done. Security events and new regulations are driving many domains toward secure software. Regulations, customer demand, and competition are driving software developers toward standards that require no unused/dead code, defense in depth, and all external input validated. Proving something doesn't exist either requires proving that it can't exist, or knowing everything about the executable. For a non-trivial executable that will demand a thorough process. Software developers in most non-regulated domains will probably need to adjust anyway.

I can understand that a regulated environment (such as safety and I guess security - which is less my experience - might put off developers. To my experience there is a difference between regulated and non-regulated. The more important I see it, that the safety scope is clearly defined and marked.

gregshue commented 2 years ago

Cryptosecurity of IoT devices has been regulated in California since 1/1/2020. Cryptosecurity of consumer wireless devices with an address (e.g., BT) has been regulated, whether or not it contains an IP stack, in Oregon since 1/1/2020. More laws are under way (see https://www.nabto.com/us-and-california-iot-security-laws-guide/).

Even without regulations, customers in application domains being targeted by the Zephyr Project have already adopted IEC 62443 Security for Industrial Automation and Control Systems (see https://press.siemens.com/global/en/pressrelease/siemens-process-control-system-first-product-iec-62443-security-certification, dated 2016). Certificate of compliance can be awarded for a specific device. Separately, ETSI EN 303 645 Cybersecurity Standard for Consumer IoT Devices is a globally acceptable standard including 33 mandatory requirements and 35 recommendations.

The more important I see it, that the safety scope is clearly defined and marked.

I agree that the safety scope is clearly defined and marked, and the security scope is not. As I look at application domains, defining a security scope and level that meets a broad set of customer domains is more important to the success of Zephyr than the safety scope. There are FAR more end users needing a secure solution than a safe one.

romkell commented 2 years ago

if we start to put so many blockers no one will be interesting to work on them unless they are paid for ...

Several events have happened in the world over the last 3-4 years that are fundamentally changing how general software development needs to be done. Security events and new regulations are driving many domains toward secure software. Regulations, customer demand, and competition are driving software developers toward standards that require no unused/dead code, defense in depth, and all external input validated. Proving something doesn't exist either requires proving that it can't exist, or knowing everything about the executable. For a non-trivial executable that will demand a thorough process. Software developers in most non-regulated domains will probably need to adjust anyway.

I can understand that a regulated environment (such as safety and I guess security - which is less my domain) might put off developers. To my experience there is a difference between regulated and non-regulated. Processes in non-regulated environments are typically leaner. For that, see it even more important, that the safety scope is clearly defined and marked.

romkell commented 2 years ago

Ok, my two cents. Even if it is generating code, it is still part of a tool. We should consider it a tool, otherwise one could use the same argument for the python interpreter or other libraries. Of course, the generated code should be tagged and go over all certification process, but all these scripts should be considered tools (classified and possible qualified).

Not fully clear to me what the notion of the said is. I do not think that we required to qualify the python interpreter as a tool, but we will have to qualify a version of the code generating script together with a particular version of the python interpreter, showing that the combination of both does what it is meant to. Your comment leads me to a thought that for the source code generating tools, the generated code parts could be treated and tested similar to hand coded files (unit testing, integration testing, software testing), possibly re-leaving the tool from qualification. Best, the test code then should be written flexible enough to cope with the different tool outputs.

It is all options, not yet defined and not yet accepted by TÜV/exida.

At this point we are simply looking for a mechanism to mark scopes.

simhein commented 2 years ago

Not sure why this was moved to a new issue but my #47029 (comment) still stands.

This was decided in the last TSC meeting to get the focus on the mechanism instead of the scope with its 800+ files.

I don't like the idea of statically tagging source files with meta-information that typical developers cannot take decisions on.

A short follow up on your comment @henrikbrixandersen . If we decide on another mechanism to mark the safety scope ( e.g spreadsheet or a yml file ) isn't this also adding meta-information to a file with a detour? And the developers cannot take decisions on that too but the only difference I guess would be the decision is not present in the developers file directly.

Cryptosecurity of IoT devices has been regulated in California since 1/1/2020. Cryptosecurity of consumer wireless devices with an address (e.g., BT) has been regulated, whether or not it contains an IP stack, in Oregon since 1/1/2020. More laws are under way (see https://www.nabto.com/us-and-california-iot-security-laws-guide/).

Even without regulations, customers in application domains being targeted by the Zephyr Project have already adopted IEC 62443 Security for Industrial Automation and Control Systems (see https://press.siemens.com/global/en/pressrelease/siemens-process-control-system-first-product-iec-62443-security-certification, dated 2016). Certificate of compliance can be awarded for a specific device. Separately, ETSI EN 303 645 Cybersecurity Standard for Consumer IoT Devices is a globally acceptable standard including 33 mandatory requirements and 35 recommendations.

Can we all focus on the topic of this RFC please? Which is not the security standard or the focus of the security WG.

The more important I see it, that the safety scope is clearly defined and marked.

I agree that the safety scope is clearly defined and marked, and the security scope is not. As I look at application domains, defining a security scope and level that meets a broad set of customer domains is more important to the success of Zephyr than the safety scope. There are FAR more end users needing a secure solution than a safe one.

I think we all agree on the point that the safety scope needs to be clearly defined, but I would appreciate if we could avoid the Debat on principles of the project whether security or safety is more important. This needs to be discussed in another forum.

Ok, my two cents. Even if it is generating code, it is still part of a tool. We should consider it a tool, otherwise one could use the same argument for the python interpreter or other libraries. Of course, the generated code should be tagged and go over all certification process, but all these scripts should be considered tools (classified and possible qualified).

Not fully clear to me what the notion of the said is. I do not think that we required to qualify the python interpreter as a tool, but we will have to qualify a version of the code generating script together with a particular version of the python interpreter, showing that the combination of both does what it is meant to. Your comment leads me to a thought that for the source code generating tools, the generated code parts could be treated and tested similar to hand coded files (unit testing, integration testing, software testing), possibly re-leaving the tool from qualification. Best, the test code then should be written flexible enough to cope with the different tool outputs.

It is all options, not yet defined and not yet accepted by TÜV/exida.

At this point we are simply looking for a mechanism to mark scopes.

I think the notion here is that scripts shall be treated as own tools instead of code that generates c code and the generated c code needs to be treated as "hand" written code with all the safety aspect and methods in mind. I fully agree with that. If we don't do this someone could come up with the argument "hey you have python code which generates c code and the tool you are using is python" and than the python interpreter would come to the table. ( Is that what you wanted to point out @ceolin ?)

About your thought @romkell It wouldn't release the tool from qualification but it can be used for it to minimize the effort.

But the discussion what needs to be defined as tool and what not is another one and with this mechanism we would have a least a tool ( pun intended ) to do such things.

henrikbrixandersen commented 2 years ago

A short follow up on your comment @henrikbrixandersen . If we decide on another mechanism to mark the safety scope ( e.g spreadsheet or a yml file ) isn't this also adding meta-information to a file with a detour? And the developers cannot take decisions on that too but the only difference I guess would be the decision is not present in the developers file directly.

Sure, at some point - before doing a recertification - the list needs to be evaluated again. But that would be in the scope of the recertification, not in the day-to-day changes done by developers without insights into how to handle the safety scope.

henrikbrixandersen commented 2 years ago

This was decided in the last TSC meeting to get the focus on the mechanism instead of the scope with its 800+ files.

Fair enough, although that's not what I understood from the meeting. I don't believe we can fully decouple the two (scope vs. mechanism).

Say, if this was tagging a few files (e.g. the base kernel files) in the tree with a safety tag, I don't think many people would object. But tagging 800+ files all over the tree makes this relevant for a lot of developers with little or no insight in the processes around safety certification.

romkell commented 2 years ago

A short follow up on your comment @henrikbrixandersen . If we decide on another mechanism to mark the safety scope ( e.g spreadsheet or a yml file ) isn't this also adding meta-information to a file with a detour? And the developers cannot take decisions on that too but the only difference I guess would be the decision is not present in the developers file directly.

Sure, at some point - before doing a recertification - the list needs to be evaluated again. But that would be in the scope of the recertification, not in the day-to-day changes done by developers without insights into how to handle the safety scope.

This is about working together and having mechanisms in the safety scope which support development towards a safety mindset. Is is about defensive programming (do's and dont's with a focus on safety, which I feel is not very far from do's and dont's for security). Measures to achieve that can be:

Your options with leaving the day-to-day changes done by developers just happen, might end up with a practically in-re-certifiable code base (e.g. mallocs / frees all over the place, loop boundaries re-calculated in the loop, etc.). It is a bit like "lets develop and see if we get it re-certified again in 2 years with the next LTS". A risky strategy.

henrikbrixandersen commented 2 years ago
  • In the safety scope have a safety architect review code changes

This goes back to the question of the scope for this. Are we talking day-to-day changes of 800+ files requiring review from one dedicated person (the safety architect) - or are we talking 10 files? 20 files?

  • write guidelines how to program for safety (what to do and what to better leave)
  • train developers concerning safety, if required

While this sounds easy as a one-line, how would you do this on an open source project such as Zephyr in practice? We have 1000+ developers so far.

Your options with leaving the day-to-day changes done by developers just happen, might end up with a practically in-re-certifiable code base (e.g. mallocs / frees all over the place, loop boundaries re-calculated in the loop, etc.). It is a bit like "lets develop and see if we get it re-certified again in 2 years with the next LTS". A risky strategy.

Now you are just making things up ;-) We don't see "mallocs / frees all over the place" or "loop boundaries re-calculated in the loop" as it is today as these are bad practice, safety scope or not.

simhein commented 2 years ago
  • In the safety scope have a safety architect review code changes

This goes back to the question of the scope for this. Are we talking day-to-day changes of 800+ files requiring review from one dedicated person (the safety architect) - or are we talking 10 files? 20 files?

At first place we need to distinguish between the tool files and the source code files which where mixed up in the 800+ files for the PR. I'm actually working on refining the scope for the source files and it seems like that have a number around 230 files under scope but I need to double check it.

  • write guidelines how to program for safety (what to do and what to better leave)
  • train developers concerning safety, if required

While this sounds easy as a one-line, how would you do this on an open source project such as Zephyr in practice? We have 1000+ developers so far.

Okay as I read your argument you are right, it it is maybe impossible for the project in a whole to introduce such things, but this is more like a thought or a idea what could be maybe done with such a marking mechanism. What also needs to be pointed out if we agree on this mechanism is that it should not happen that the safety or whatever committee/WG will introduce new processes or implementations based on this mechanism without the approval of the TSC/process WG and the community.

Your options with leaving the day-to-day changes done by developers just happen, might end up with a practically in-re-certifiable code base (e.g. mallocs / frees all over the place, loop boundaries re-calculated in the loop, etc.). It is a bit like "lets develop and see if we get it re-certified again in 2 years with the next LTS". A risky strategy.

Now you are just making things up ;-) We don't see "mallocs / frees all over the place" or "loop boundaries re-calculated in the loop" as it is today as these are bad practice, safety scope or not.

Well that is a good thing to hear :) I see the point of @romkell more like from the architectural perspective, when someone wants to introduce for example a new implementation or a design for something very dynamic, it would be possible that the project or committee could act on it like we do now with the normal Architecture WG meeting ( correct me if I'm wrong ).

gregshue commented 2 years ago

This was decided in the last TSC meeting to get the focus on the mechanism instead of the scope with its 800+ files.

Agreed. It was also noted by @kestewart that the mechanism is expected to work for identifying the security scope.

Can we all focus on the topic of this RFC please? Which is not the security standard or the focus of the security WG.

Agreed. As I see it, all of the discussion around security regulations and standards pretty clearly shows the scope of the mechanism needs to include all the places where security tags are expected to appear. This brings up a very relevant question:

This is about working together and having mechanisms in the safety scope which support development towards a safety mindset. ...

There are analogs in the security scope that this mechanism is also expected to support.

Have we even created a list of the requirements a mechanism must meet?

ceolin commented 2 years ago

Not fully clear to me what the notion of the said is. I do not think that we required to qualify the python interpreter as a tool, but we will have to qualify a version of the code generating script together with a particular version of the python interpreter, showing that the combination of both does what it is meant to. Your comment leads me to a thought that for the source code generating tools, the generated code parts could be treated and tested similar to hand coded files (unit testing, integration testing, software testing), possibly re-leaving the tool from qualification. Best, the test code then should be written flexible enough to cope with the different tool outputs. It is all options, not yet defined and not yet accepted by TÜV/exida. At this point we are simply looking for a mechanism to mark scopes.

I think the notion here is that scripts shall be treated as own tools instead of code that generates c code and the generated c code needs to be treated as "hand" written code with all the safety aspect and methods in mind. I fully agree with that. If we don't do this someone could come up with the argument "hey you have python code which generates c code and the tool you are using is python" and than the python interpreter would come to the table. ( Is that what you wanted to point out @ceolin ?)

Yes !

About your thought @romkell It wouldn't release the tool from qualification but it can be used for it to minimize the effort.

But the discussion what needs to be defined as tool and what not is another one and with this mechanism we would have a least a tool ( pun intended ) to do such things.

gregshue commented 2 years ago

From https://github.com/zephyrproject-rtos/zephyr/pull/47029#issuecomment-1170212688

We need to have a mechanism to indicate what is in and what out.

What are other options:

  • parsing zephyr.lst is fine but to verify it again what?
    • some kind of database telling what is in the safety scope and what not? Who would maintain that? That database would rot even faster, I guess, because it is detached from the code completely.
    • Use KConfig #ifdef CONFIG_SAFETY or similar
    • Mark all safety related functions with / @in_safety_scope / or similar so those information can be copied when splitting files. Quite some micro management, but easier to handle for those moving these things around

Another option would be to leave "main" untouched from such mechanisms and only do and re-do it on each x.y-auditable branch which will be certified.

A few thoughts on the options above:

It seems "features" are scoped and each activation controlled by Kconfigs. How cleanly are the "features" partitioned into files?

simhein commented 1 year ago

No conclusion on this specific RFC for almost one year, also other options are in consideration in the safety committee/WG. I will close this RFC and it can be reopened if needed.