Make Coverity Static Analysis report a Requirement for Project Release

ceolin commented 1 year ago

Introduction

This proposal advocates for the integration of Coverity static analysis as an indispensable requirement for our project releases. Static analysis is not merely a tool but a proactive strategy to unearth and address potential issues in the early stages of development, long before they mature into critical vulnerabilities. By scrutinizing code at rest, static analysis unveils latent defects and potential security risks, thus bolstering the resilience of our software against future threats.

Objective

The primary objective of this proposal is to make static analysis results a fundamental requirement for project releases, with the following key goals:

Zero High Critical Issues: Ensure that project releases do not contain any high-criticality issues that could potentially compromise the functionality, security, or reliability of our software. High-criticality issues represent vulnerabilities that, if left unresolved, could have severe consequences.
Mandate the inclusion of a static analysis report as part of the release artifacts. This report will provide a transparent and detailed overview of the codebase's health, highlighting any identified issues, their severity, and the corresponding corrective actions taken, or planned, to address them.

Benefits

Enhanced Code Quality: Discuss how static analysis can help identify and rectify coding issues, leading to higher code quality and reduced defects.

By providing a static analysis report as a collateral artifact of our project releases, we empower downstream users who may need to perform their own static analysis when integrating our software into their products. This report serves as a valuable resource for downstream teams, allowing them to gain insights into the code quality, security, and potential issues within our codebase. It simplifies the process of identifying and addressing vulnerabilities, thus expediting their product development cycles.

The static analysis report acts as a reference point, enabling downstream users to build on the triage data and further enhance the security and reliability of their own projects. This collaborative approach not only strengthens the overall software ecosystem but also fosters goodwill and trust among our user community, reaffirming our commitment to delivering quality software.

Implementation Plan

Run Coverity Bi-Weekly

We will establish a regular schedule for running Coverity static analysis bi-weekly. This consistent, recurring process will allow us to continuously monitor the codebase, ensuring that issues are identified and addressed proactively.

Generate GitHub Issues for Coverity Issues

Automatically generate GitHub issues for any issues detected by Coverity. These issues will have the same (or equivalent) priority initially defined by Coverity.

To ensure accountability and efficient issue resolution, we will automatically assign each GitHub issue to the respective code owner who is responsible for the code in question.

Code Owners (Developers) will be responsible for either fixing or triaging the assigned issues. When an issue is assigned, the code owner is expected to take ownership of the issue and determine whether it requires a fix or is a false positive.

If the issue is a false positive, is the code owner responsability triage it in Coverity. They have to:

Make themselves owner of the issue
Classify it (e.g: False positive, Intentional, ...)
Set an action (e.g: ignore, ...)
Provide some high-level justification

Establishing a Coverity Audit Group

As part of our implementation plan, we will create a dedicated team comprising members with expertise in static analysis, code quality, and software security. The primary goal of this group is to ensure the effectiveness of the static analysis process and verify that identified issues are properly triaged and resolved in a timely manner.

Risks and Mitigation:

One potential risk associated with implementing the static analysis requirement is the possibility that some issues may be ignored or fail to promptly be addressed. This could result from various factors, including high workloads, competing priorities, or a lack of awareness regarding the significance of the issues.

To mitigate the risk of ignored issues, we can implement a policy for addressing code with unresolved static analysis issues. In cases where code owners consistently neglect assigned issues and fail to take corrective actions within defined timeframes, and if these issues are deemed critical to the project's integrity and security, the following mitigation strategy will be applied:

Code Removal Evaluation

The project management and technical leads will conduct an evaluation of the affected code to determine its criticality and importance to the project. This assessment will consider factors such as the severity of the unresolved issues, the code's role in the software, and the availability of maintainers or other team members with expertise in that area.

Lack of Maintainership

If it is found that the code lacks a dedicated maintainer or responsible team member, and it is deemed infeasible to reassign ownership or address the issues within a reasonable timeframe, the project management will make the decision to remove the code or module from the project.

Benefits of Code Removal Due to Lack of Maintainership

This mitigation strategy not only addresses the risk of ignored issues but also promotes accountability and proactive issue resolution within the development team. It ensures that code with unresolved critical issues does not compromise the security and stability of the project. By documenting and evaluating the code's removal, the team can minimize potential disruptions and maintain project integrity while mitigating the risks associated with unaddressed vulnerabilities.

Timeline

Make this requirement part of the next release.

Resource Allocation

The number of builds we are planning to execute using Coverity is covered by the license (or agreement) for open source projects. Therefore, there are no additional costs associated with the usage of Coverity for static analysis in this context.

ceolin commented 1 year ago

@keith-zephyr @d3zd3z @dleach02 @nashif @carlescufi @henrikbrixandersen ...

dleach02 commented 1 year ago

Recommend a clear line item in the release checklist to affirm static analysis run has been run. Whether we do it on a periodic (weekly) basis or pre final release doesn't matter as much as long as the release team has a checkbox to affirm it was done as part of the release collateral.

keith-zephyr commented 1 year ago

AIs @ceolin to update language to mention generic static analysis. Then add this documentation into the official release criteria. @nashif suggests creating pull request with this language and allow others to review. @ceolin to work with the security group and start discussions for creation of the audit group.

@nashif - it would be helpful to extract the types of issues we scan for and to also give tips on how to fix issues. We should also provide guidelines for marking false positives, or code that intentionally violates static analysis scans.

@ceolin - last release had many buffer overflows that were caught by coverity, but not monitored. @ceolin - access to the coverity tool is an issue. The audit group will address backlog and triage issues. @dleach02 - Need a 2nd reviewer before marking false positives. @ceolin - Probably too much overhead to do this review on each coverity issue. Especially for handing the backlog. @dleach02 - Worry about marking things as false positive, as Coverity's tooling is quite good. At NXP, the PSIRT team monitors the security bulletins from Zephyr, but then there is follow up work to analyze and address the problems. @wbober - Proposal only runs coverity on a cadence. But this doesn't catch problems as they are introduced. @ceolin - the code owner is the assignee for coverity issues, similar to regular bugs. @wbober - can we require the PR authors to run the scanning? @ceolin - Ideally yes, but the coverity scans are expensive to run. The scans don't distinguish between new code and existing code. So no doable right now. @ceolin - free version limits the number of builds that we can do. @ceolin - we want to catch these problem before the release. @ceolin safety/security standards require a static analysis report. @gregshue - auditable branch used for the security certs. Are we expecting product makers to develop from mainline (3.4/3.5/etc). @dleach02 - if they choose to do so. @dleach02 - the proposal is to generate the static analysis as part of the release. @dleach02 - there is consensus that adding these scans on a regular cadence is needed for the project. @keith-zephyr - how are scans managed for multiple archs? @nashif - we run multiple scans. Coverity doesn't support multiple jobs. @nashif most problems are found in the shared code (subsystems, libraries). Only need to run the scan once against common code (generally). @gregshue - does this have to be run against different tool chains? @nashif - we only do scanning the Zephyr SDK. @gregshue - wants to note in the documentation that we are only running scans against the Zephyr SDK. @nashif - suggests to update the document to reference static analysis generically - using coverity isn't required. We should also have a goal to scan on every PR. But we don't have tooling to do this now. @nashif want to raise the bar so that we are moving towards this goal eventually, but we are constrained with what can be implemented now. @ceolin - agrees that is the goal of the project. @nashif - the project already has a requirement to run static analysis. But it's not in the documentation. @dleach02 - the release checklist needs to include the static analysis @keith-zephyr - there is consensus from the group to run static analysis scans regularly and to assign issues to code owners. @ceolin - Proposal for handling lack of maintainership and code review is for the future. We can revisit these policies after the scanning is implemented. @nasif - agrees that we need to evaluate dead code/unmaintained code to ensure the tree is cleaned up regularly. There are some boards with limited RAM that probably haven't been tested. @gregshue - There may be downstream projects that use Zephyr without the scheduler. @nashif - these boards probably still may not work.

keith-zephyr commented 1 year ago

Moving to on-hold until @ceolin has created documentation to review.

ceolin commented 1 year ago

Moving to on-hold until @ceolin has created documentation to review.

@keith-zephyr I have created the pr, I tried to organize the information here around different sections in our documentation. Some of the items proposed here were already part of the documentation and didn't need to be added.

keith-zephyr commented 1 year ago

Process 2023-11-15 @ceolin - It's still an open issue to document and the static analysis audit group. This this needs to be brought up with the security WG. @keith-zephyr - suggest also publishing group to the TSC. @ceolin - will document a short summary of the responsibilities for the static analysis group. This can be added to the existing project roles area. @ceolin - will also pull on the static analysis reports so we can review for inclusion the release artifacts. @ceolin - will introduce this topic at the next Security WG meeting.

zephyrproject-rtos / zephyr