zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.41k stars 6.38k forks source link

Twister test data and compliance - a reevaluation #63156

Open LukaszMrugala opened 11 months ago

LukaszMrugala commented 11 months ago

Introduction

We would like to have some more freedom when it comes to files used in tests for Twister.

Problem description

Currently, Compliance Action has strict rules that are not relaxed for test data. To enumerate some of them, test data cannot contain any binary files and every file must not have any trailing whitespace in its lines. This makes some functional tests unnecessarily hard to create, as functions under test may expect a binary file or a file that does not comply with the rules for source files.

Proposed change

We would like to make the scripts/tests/twister/test_data and scripts/tests/twister_blackbox/test_data and their subdirectories exempt from some compliance checks.

Detailed RFC

scripts/ci/check_compliance.py contains our compliance code. We would need to add some kind of an exemption list to it, where we could specify glob-like patterns to exempt from tests.

A simpler option may be to specify directory paths to exempt from select checks, but it lacks some of the granularity - if we do not want e.g. .exe files anywhere in the repo, such a broad exception would be harmful.

Proposed change (Detailed)

  1. Analyse current compliance checks to verify which checks are unnecessary for test data.
    • Checks to be relaxed identified by this RFC:
    • CheckPatch - whitespace errors are commonplace in e.g. log files
    • BinaryFiles - binary file inclusion
    • GitDiffCheck - whitespace errors and possible false positives with git conflict markers
  2. Introduce a method to exempt directories and file types or names from such checks
    • Author posits that a glob-like patterns should be a good basis for such a feature. Git pathspec's :(exclude) is a good investigation point.
  3. Add test_data directories, .log files and .elf files to it, keeping it open for possible future changes.

Dependencies

This change would modify our compliance checking to make it more lenient.

Concerns and Unresolved Questions

As with any compliance check leniency, we may open ourselves up to some threats.

The direct cause of this RFC was a Draft PR for tests on size_calc.py module, which requires binary .elf files and text build logs.

Alternatives

We could keep our current compliance rules and accept failing PRs on case-by-case basis. This could lengthen our review process as well as only push the problem further back in time if more tests require 'real' data.

We could devise our tests to never use such files. While possible, this could easily degenerate into having large binary variables inside the test files to simulate having a binary file anyway, thus invalidating the purpose of the compliance check while making test files much more unwieldy. Alternatively, if we wanted to curb down on that, test quality may be lower, as real data won't be able to enter our tests.

ifyall commented 11 months ago

While text files that are not source code seem like something we could consider easily exempting from the compliance checks, I do worry about the binary files. What the licensing implications of allowing binary files in the Zephyr tree, even in the test directory?

marc-hb commented 10 months ago

This could lengthen our review process as well as only push the problem further back in time if more tests require 'real' data.

I think this is the most important question: how often do you think this will happen again? git does not manage binaries well, if you think this will be more and more frequent then all these binaries should be stored in either LFS or similar, not in the main repo. If you think this will be rare then it's not going to lengthen the review process much. Either way no special treatment seems required.

The direct cause of this RFC was a https://github.com/zephyrproject-rtos/zephyr/pull/63109 for tests on size_calc.py module, which requires binary .elf files and text build logs.

These elf files and build logs seem very basic, have you considered generating them instead? Caching them if you're concerned about performance. BTW parsing the build log does not seem like a good idea. The build system already does complicated pre- and post- processing of ELF files, why is this framework not enough to get the information you need?

Last but not least: is your build logs parsing really whitespace sensitive!?

I do worry about the binary files. What the licensing implications of allowing binary files in the Zephyr tree, even in the test directory?

There is also a security concern: binary files cannot be reviewed so this is a potential attack surface (even more so when they're meant to be parsed)