zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.93k stars 6.65k forks source link

Verify builds are reproducible in the CI #50205

Open keith-zephyr opened 2 years ago

keith-zephyr commented 2 years ago

Introduction

Zephyr builds should be reproducible. A checkout of Zephyr from the same commit, built with the same toolchain, should generate an identical image binary.

Problem description

This has been proposed before (https://github.com/zephyrproject-rtos/zephyr/pull/11523 and https://github.com/zephyrproject-rtos/zephyr/pull/14593). But there are no tests that verify reproducible build in the Zephyr tree at the moment.

Furthermore, reproducible builds were broken for an unknown amount of time, but fixed with https://github.com/zephyrproject-rtos/zephyr/pull/48195.

Proposed change

Add a new github workflow that verifies builds are reproducible. This workflow will be run on every PR.

The workflow can follow the blueprint of the Footprint Delta workflow. The new workflow would build TBD platforms, back to back, verifying the resulting binaries are identical.

Note that the build command west build -b native_posix tests/drivers/build_all/sensor has been known to catch problems with devicetree generation that results in non-reproducible builds.

Dependencies

The new github workflow will block new PRs if the reproducible build test fails.

Concerns and Unresolved Questions

Running this check against every PR will incur additional computing time and resources.

Alternatives

Run the reproducible build check less frequently, such as nightly. However, this will require a significant bisect effort to identify the culprit PR when any failures are detected. The incremental cost of some additional builds on each PR seems worth the trouble.

stephanosio commented 2 years ago

cc @marc-hb

stephanosio commented 2 years ago

This workflow will be run on every PR.

I do not expect this to be something that breaks often. Bi-weekly build should be fine.

gmarull commented 2 years ago

We have many non-locked Python dependencies that are used somehow during the build process, they should be considered.

marc-hb commented 2 years ago

Here's a list of 20+ old reproducibility fixes:

This should show what the most common problems are.

In the same place there's an (obsolete) test script. The approach was crude but very effective:

marc-hb commented 2 years ago

I do not expect this to be something that breaks often.

Agreed. Reproducibility testing and fixing is rare, but reproducibility regressions are very rare too.

Bi-weekly build should be fine.

On the other hand, IF it's cheap and quick to run then why not run it every PR?

keith-zephyr commented 2 years ago

Because of the amount of generated code, I'm in favor of checking on every PR. Maybe the github workflow can be setup to run on any changes to the ./scripts directory, but also setup as a weekly run to catch problems with the actual source code.

marc-hb commented 2 years ago

These 2 additional lines are IMHO a big step forward, please help review:

marc-hb commented 1 year ago

Github Actions for the Zephyr+SOF project have been routinely and successfully comparing binaries built on Linux versus Windows in every PR for a few months now:

To achieve this I overrode the default config change in #51954 in an SOF-specific way: https://github.com/thesofproject/sof/commit/945adb8d1660ed4

Building across two different operating systems provides a lot of differences "for free" that can be very difficult to achieve on the same operating system (see old #14593 attempt). Kudos to @aborisovich for implementing the Windows build in Github Actions.

This does not catch everything (e.g.: __DATE__) but it indirectly provides reproducibility coverage for a lot of the Zephyr project.

Note a build is no more "reproducible" than a project is "bug-free"; fixing reproducibility bugs is a continuous activity exactly like fixing other bugs. Typically, building some code is reproducible in some Kconfiguration but fails when that Kconfiguration is changed - exactly like other bugs. Most recent example with CONFIG_ASSERT:

Switching to an old toolchain can also be very problematic: