zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.41k stars 6.38k forks source link

twister: define, document and test platform scope selection rules #57595

Open PerMac opened 1 year ago

PerMac commented 1 year ago

Is your enhancement proposal related to a problem? Please describe. The selection rules for platform scope in twister are not clearly defined, documented nor tested. The logic for selecting on which platforms a test will be executed is quite complex https://github.com/zephyrproject-rtos/zephyr/blob/main/scripts/pylib/twister/twisterlib/testplan.py#L592. What's more, some (not obvious) decision parts of it are documented only in the code, like the fact that If there isn't any overlap between the platform_allow list and the platform_scope we set the scope to the platform_allow list https://github.com/zephyrproject-rtos/zephyr/blob/main/scripts/pylib/twister/twisterlib/testplan.py#L667.

This makes users having to figure out, with trail and error approach, why/how their twister call ended up with a given scope. Since there are no dedicated unit tests, there is no warranty that the behavior won't suddenly change. In fact, it just happened recently causing us potentially serious quality threat: maintainers of out of zephyr tests thought that when twister is called without any "--platform" arg it will use platforms from yamls "platform_allow". This seemed to reflect the cited comment. In fact, the logic is more complex and the above statement is correct only in rather particular condition. What's more, this logic was recently changed in a PR which was addressing different topic, without documenting this change https://github.com/zephyrproject-rtos/zephyr/pull/52715/files#diff-0a0df38c6d70056ac2d7f06a7a16f8a148d1013d87fae6222d6a64323b6702cbR679 and wasn't noticed during the review. The change makes the comment no longer true. Instead of the whole platform_allow list only a single platform from this list is taken. Suddenly, CIs with scopes based on platform_allow, vastly decreased their coverage to a single platform per test. We were rather lucky to find this change before any regression creeped in (only because some of such plans where having multiple steps where later ones begin to complain that some artifacts from twister step are missing). We also found that this change made the twister's behavior non-deterministic. We were investigating why we are not getting some artifacts from our CI and why some are duplicated. The reason is that platform_allow is a set which is unordered. Picking the first item results in different platforms chosen between different nodes. This made the subsets being taken from different source list resulting in duplicates/missing tests in total outcome being a sum of subsets. Finding the root cause of the above issues costed CIs' maintainers significant amount of time figuring out what has changed and how to fix/modify their plans to address the change.

Describe the solution you'd like First, we need to decide what should be the business logic for platform scope selection and create a flow chart for it. Then we have to document it. After that we will create regression tests verifing if the logic is maintained.

Describe alternatives you've considered Haven't considered**

The fact that the scope is affected by #52715 can be reproduced by calling scripts/twister -T samples/bluetooth/central_otc/ -v before and after the change. After change (checkout at f4dc918d53)

~/zephyrproject/zephyr [:f4dc918d53|…11⚑ 16] 
16:46 $ scripts/twister -T samples/bluetooth/central_otc/ -v
Renaming output directory to /home/maciej/zephyrproject/zephyr/twister-out.6
INFO    - Using Ninja..
INFO    - Zephyr version: zephyr-v3.3.0-880-gf4dc918d5337
INFO    - Using 'zephyr' toolchain.
INFO    - Selecting default platforms per test case
INFO    - Building initial testsuite list...
INFO    - Writing JSON report /home/maciej/zephyrproject/zephyr/twister-out/testplan.json
INFO    - JOBS: 4
INFO    - Adding tasks to the queue...
INFO    - Added initial list of jobs to queue
INFO    - 1/1 nrf52833dk_nrf52833       samples/bluetooth/central_otc/sample.bluetooth.central_otc PASSED (build)

INFO    - 1 test scenarios (1 test instances) selected, 0 configurations skipped (0 by static filter, 0 at runtime).
INFO    - 1 of 1 test configurations passed (100.00%), 0 failed, 0 errored, 0 skipped with 0 warnings in 34.27 seconds
INFO    - In total 0 test cases were executed, 1 skipped on 1 out of total 549 platforms (0.18%)
INFO    - 0 test configurations executed on platforms, 1 test configurations were only built.
INFO    - Saving reports...
INFO    - Writing JSON report /home/maciej/zephyrproject/zephyr/twister-out/twister.json
INFO    - Writing xunit report /home/maciej/zephyrproject/zephyr/twister-out/twister.xml...
INFO    - Writing xunit report /home/maciej/zephyrproject/zephyr/twister-out/twister_report.xml...
INFO    - Run completed

before (4c97dd546a)

scripts/twister -T samples/bluetooth/central_otc/ -v
Renaming output directory to /home/maciej/zephyrproject/zephyr/twister-out.7
INFO    - Using Ninja..
INFO    - Zephyr version: zephyr-v3.3.0-878-g4c97dd546afa
INFO    - Using 'zephyr' toolchain.
INFO    - Selecting default platforms per test case
INFO    - Building initial testsuite list...
INFO    - Writing JSON report /home/maciej/zephyrproject/zephyr/twister-out/testplan.json
INFO    - JOBS: 4
INFO    - Adding tasks to the queue...
INFO    - Added initial list of jobs to queue
INFO    - 1/3 nrf52833dk_nrf52833       samples/bluetooth/central_otc/sample.bluetooth.central_otc PASSED (build)
INFO    - 2/3 nrf21540dk_nrf52840       samples/bluetooth/central_otc/sample.bluetooth.central_otc PASSED (build)
INFO    - 3/3 nrf52840dk_nrf52840       samples/bluetooth/central_otc/sample.bluetooth.central_otc PASSED (build)

INFO    - 1 test scenarios (3 test instances) selected, 0 configurations skipped (0 by static filter, 0 at runtime).
INFO    - 3 of 3 test configurations passed (100.00%), 0 failed, 0 errored, 0 skipped with 0 warnings in 72.46 seconds
INFO    - In total 0 test cases were executed, 3 skipped on 3 out of total 549 platforms (0.55%)
INFO    - 0 test configurations executed on platforms, 3 test configurations were only built.
INFO    - Saving reports...
INFO    - Writing JSON report /home/maciej/zephyrproject/zephyr/twister-out/twister.json
INFO    - Writing xunit report /home/maciej/zephyrproject/zephyr/twister-out/twister.xml...
INFO    - Writing xunit report /home/maciej/zephyrproject/zephyr/twister-out/twister_report.xml...
INFO    - Run completed
zephyrbot commented 6 months ago

Hi @nashif,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. It was just assigned to you based on the labels. If you don't consider yourself the right person to address this issue, please re-assing it to the right person.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@PerMac you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!