Open nashif opened 1 year ago
@hakehuang @PerMac @SzymonRichert @golowanow
prototyping this right now, test summary before:
./scripts/twister -p native_posix -T tests/kernel/threads/ -v
ZEPHYR_BASE unset, using "/home/nashif/zephyrproject/zephyr"
Renaming output directory to /home/nashif/zephyrproject/zephyr/twister-out.9
INFO - Using Ninja..
INFO - Zephyr version: zephyr-v3.3.0-4093-g5781478fd5c4
INFO - Using 'zephyr' toolchain.
INFO - Building initial testsuite list...
INFO - Writing JSON report /home/nashif/zephyrproject/zephyr/twister-out/testplan.json
INFO - JOBS: 112
INFO - Adding tasks to the queue...
INFO - Added initial list of jobs to queue
INFO - 3/10 native_posix tests/kernel/threads/tls/kernel.threads.tls SKIPPED (runtime filter)
INFO - 4/10 native_posix tests/kernel/threads/thread_error_case/kernel.threads.error.case SKIPPED (runtime filter)
INFO - 5/10 native_posix tests/kernel/threads/dynamic_thread/kernel.threads.dynamic SKIPPED (runtime filter)
INFO - 6/10 native_posix tests/kernel/threads/thread_apis/kernel.threads.apis.pinonly SKIPPED (runtime filter)
INFO - 7/10 native_posix tests/kernel/threads/tls/kernel.threads.tls.userspace SKIPPED (runtime filter)
INFO - 8/10 native_posix tests/kernel/threads/thread_init/kernel.threads.init PASSED (native 0.015s)
INFO - 9/10 native_posix tests/kernel/threads/thread_stack/kernel.threads.thread_stack PASSED (native 0.047s)
INFO - 10/10 native_posix tests/kernel/threads/thread_apis/kernel.threads.apis PASSED (native 0.081s)
INFO - 10 test scenarios (10 test instances) selected, 7 configurations skipped (2 by static filter, 5 at runtime).
INFO - 3 of 10 test configurations passed (100.00%), 0 failed, 0 errored, 7 skipped with 0 warnings in 7.13 seconds
INFO - In total 45 test cases were executed, 38 skipped on 1 out of total 571 platforms (0.18%)
INFO - 3 test configurations executed on platforms, 0 test configurations were only built.
INFO - Saving reports...
INFO - Writing JSON report /home/nashif/zephyrproject/zephyr/twister-out/twister.json
INFO - Writing xunit report /home/nashif/zephyrproject/zephyr/twister-out/twister.xml...
INFO - Writing xunit report /home/nashif/zephyrproject/zephyr/twister-out/twister_report.xml...
INFO - Run completed
after:
./scripts/twister -p native_posix -T tests/kernel/threads/ -v
ZEPHYR_BASE unset, using "/home/nashif/zephyrproject/zephyr"
Renaming output directory to /home/nashif/zephyrproject/zephyr/twister-out.10
INFO - Using Ninja..
INFO - Zephyr version: zephyr-v3.3.0-4083-gaa1b892e7999
INFO - Using 'zephyr' toolchain.
INFO - Building initial testsuite list...
INFO - Writing JSON report /home/nashif/zephyrproject/zephyr/twister-out/testplan.json
INFO - JOBS: 112
INFO - Adding tasks to the queue...
INFO - Added initial list of jobs to queue
INFO - 1/8 native_posix tests/kernel/threads/dynamic_thread/kernel.threads.dynamic FILTERED (runtime filter)
INFO - 2/8 native_posix tests/kernel/threads/tls/kernel.threads.tls.userspace FILTERED (runtime filter)
INFO - 3/8 native_posix tests/kernel/threads/thread_apis/kernel.threads.apis.pinonly FILTERED (runtime filter)
INFO - 4/8 native_posix tests/kernel/threads/thread_error_case/kernel.threads.error.case FILTERED (runtime filter)
INFO - 5/8 native_posix tests/kernel/threads/tls/kernel.threads.tls FILTERED (runtime filter)
INFO - 6/8 native_posix tests/kernel/threads/thread_init/kernel.threads.init PASSED (native 0.016s)
INFO - 7/8 native_posix tests/kernel/threads/thread_stack/kernel.threads.thread_stack PASSED (native 0.047s)
INFO - 8/8 native_posix tests/kernel/threads/thread_apis/kernel.threads.apis PASSED (native 0.085s)
INFO - 10 test scenarios (10 test instances) selected, 7 configurations filtered (2 by static filter, 5 at runtime).
INFO - 3 of 3 test configurations passed (100.00%), 0 failed, 0 errored, 0 warnings in 7.18 seconds
INFO - In total 45 test cases were executed, 38 skipped on 1 out of total 570 platforms (0.18%)
INFO - 3 test configurations executed on platforms, 0 test configurations were only built.
INFO - Saving reports...
INFO - Writing JSON report /home/nashif/zephyrproject/zephyr/twister-out/twister.json
INFO - Writing xunit report /home/nashif/zephyrproject/zephyr/twister-out/twister.xml...
INFO - Writing xunit report /home/nashif/zephyrproject/zephyr/twister-out/twister_report.xml...
INFO - Run completed
Summary:
3 of 3 test configurations passed (100.00%)
instead of 3 of 10 test configurations passed (100.00%)
is more clear and intuitive. Still need to have a final summary based on testcases rather than suites (which will have also skipped reporting and other things based on above new states).
in above prototype, the SKIPED
status is replaced by FILTERED
, correct? @nashif .
Besides for the INPROGRESS = 'in progress'
may be chagned to INPROGRESSING
, so that the status is defined as
test final mode end with ED
test abnormal mode end with a noun, which is only ERROR
by now
test proceeing status is end with ING
Besides for the INPROGRESS = 'in progress' may be chagned to INPROGRESSING, so that the status is defined as test final mode end with ED
in progress is an intermediate state, any tests ending up with this state should have a final status, in this case we usedto do 'FAILED', but one possible outcome can also be an INCOMPLETE state.
The following states will be defined:
class Status: PASSED = 'passed' FAILED = 'failed' ERROR = 'error' SKIPPED = 'skipped' NOTRUN = 'not run' BLOCKED = 'blocked' INPROGRESS = 'in progress'
I think, to avoid confusion, there should be different classes for what is a Status at execution time, and what is a final State to report. In this way a Test Suite will have its (State, Status) tuple, e.g. the 'only built' suite might have (Passed, Built) or (Error, Building).
Also it would be nice to clarify how Status of a Test Suite corresponds to multiple statuses of its Test Cases.
another update of summary:
INFO - 2 test scenarios (84 test instances) selected, 24 configurations filtered (12 by static filter, 12 at runtime).
INFO - 56 of 60 test configurations passed (93.33%), 4 failed, 0 errored, 0 not run, 0 warnings in 198.91 seconds
INFO - 737 of 744 test cases passed (99.06%), 4 failed, 0 not run, 3 blocked on 42 out of total 570 platforms (7.37%)
INFO - 60 test configurations executed on platforms, 0 test configurations were only built.
Also it would be nice to clarify how Status of a Test Suite corresponds to multiple statuses of its Test Cases.
yes, this will part of the docs
I think, to avoid confusion, there should be different classes for what is a Status at execution time, and what is a final State to report. In this way a Test Suite will have its (State, Status) tuple, e.g. the 'only built' suite might have (Passed, Built) or (Error, Building).
This might be more confusing. The only state that is internal is the 'in progress' and will not appear in any reports, for everything else we need 1 single final status.
The only state that is internal is the 'in progress' and will not appear in any reports, for everything else we need 1 single final status.
and the proposed "notrun" is, isn't it ?
Currently , for Test Suite and Test Case's statuses we have something like below possible:
| Suite:| Filtered | Passed | Error | Failed |
| Case: ======|==========|========|=======|========|
| Filtered | x | - | - | - |
| Started | - | x | - | - |
| Skipped | - | x | - | x |
| Passed | - | x | - | x |
| Blocked | - | x | x | x |
| Failed | - | - | x | x |
and the proposed "notrun" is, isn't it ?
no, this one is new, see above
A testsuite that is only being built and can't run will be reported as NOTRUN, right now we report those as SKIPPED
Currently , for Test Suite and Test Case's statuses we have something like below possible:
| Suite:| Filtered | Passed | Error | Failed | | Case: ======|==========|========|=======|========| | Filtered | x | - | - | - | | Started | - | x | - | - | | Skipped | - | x | - | x | | Passed | - | x | - | x | | Blocked | - | x | x | x | | Failed | - | - | x | x |
not entirely correct:
| Suite:| Filtered | Passed | Error | Failed |
| Case: ======|==========|========|=======|========|
| Filtered | x | - | - | - |
| Started | - | - | - | - |
| Skipped | - | x | x | x |
| Passed | - | x | - | x |
| Blocked | - | - | x | x |
| Failed | - | - | x | x |
Besides for the
INPROGRESS = 'in progress'
may be chagned toINPROGRESSING
, so that the status is defined as test final mode end withED
test abnormal mode end with a noun, which is onlyERROR
by now test proceeing status is end withING
Let's avoid the linguistic part of this and not use verbs at all, so
class Status: PASS = 'pass' FAIL = 'fail' ERROR = 'error' SKIP = 'skipp' NORUN = 'no run' BLOCK = 'block' INPROGRESS = 'in progress'
Currently , for Test Suite and Test Case's statuses we have something like below possible:
not entirely correct:
@nashif, these two combinations (Passed,Started)
and (Passed,Blocked)
are currently observed when a TC is either passed despite its end record lost, or failed as expected (drivers.coredump.api
).
@nashif, these two combinations
(Passed,Started)
and(Passed,Blocked)
are currently observed when a TC is either passed despite its end record lost, or failed as expected (drivers.coredump.api
).
yes, could be, there is a bug somewhere, if you can point me to such cases with more details it would be great
In general we think this is a great initiative. Some comments from our side:
build-only:true
in their yamls.) and ones which are not executed due to the CI limitation (execution is not possible, e.g. no hw, some emu/simulator not installed, etc.). If we keep the "end_state" it might help into translating the internal states into states allowed in a given format (e.g. junit). If a test ended after being built it is fine if it was expected that it will only build but not fine when it was executable but the executed.
-not run: A test case or test suite that is not yet run. This is the initial state of any testcase.
and A testsuite that is only being built and can't run will be reported as NOTRUN,
these statements describe a different state (initial vs built_but_not_executed). It can be very useful to prepare a flow chart showing all possible statuses and possible paths of changes. We believe such visual aid would allow for a better verification if the transitions between states are happening according to the design, if they make sense, etc.
Does this really need a flow diagram?
notrun -> in progress -> passed/failed/error
notrun -> skipped
notrun -> blocked
- I don't understand the BLOCK status. From the description it sounds as it is replacing FILTERED. In the current code it is used instead of ERROR. If it is to replace "filtered", what about calling it "DESCOPE", i.e. removed from the scope?
description changed.
- I think we could aim at descriptions, which directly indicate where along the execution path we are. I think IN PROGRESS is too vague, since it would correspond to transition between any two states. Maybe it could be resolved if we keep tuples instead of single strings? Example (last_state, end_state). Such tuple indicates were was the last "checkpoint" and were it was heading .
description changed.
- I think the challenge is that different end product is expected for different "tests". Majority of configurations are expected to be only build in the CI.
A test that only builds is a test that was not run, it tests if the code builds, it does not test the functionality. So, saying a test has passed just because we were able to build it wrong and can be confusing.
- It is even more challenging, since we have "true" build-only configurations (i.e. those with
build-only:true
in their yamls.)
IMO those are bugs and shall be removed and fixed.
- and ones which are not executed due to the CI limitation (execution is not possible, e.g. no hw, some emu/simulator not installed, etc.).
and those will need us to figure out ways how to test them in CI. For example, all tests that had keyboard harness shall be converted to use pytest and more tests with custom and placeholder harnesses need to move in the same direction. In short, a test/sample that was built, should not be reported as PASSED or FAILED. if is now marked as not run and if the build fails, it is marked as error. For CI purposes, this is perfect.
- -
not run: A test case or test suite that is not yet run. This is the initial state of any testcase.
andA testsuite that is only being built and can't run will be reported as NOTRUN,
these statements describe a different state (initial vs built_but_not_executed).
Hmm, not really. The initial state is set after the test was built, so either it runs and then passes or fails, or it remains as not run.
Introduction
Statuses of tests in twister have been maintained in free form and we never had a list of defined statuses with clear description of what they mean. This RFC does provide some details about those are being used and some changes to statuses and reporting in general.
Problem description
There are many cases where a test does not pass or is not run for various reasons, we should have a clear end status for each test to help identify real issues and misses in the test execution. First, this calls for a pre-defined set of status with clear description of what they mean, then we need to tweak the reporting to use those the pre-defined statuses and verify consistency across all tests.
We also have some inconsistency rooted in the fact we wanted to keep things backward compatible with original reporting structure, so we end up reporting status for both a testsuite and the included testscases, which can lead to confusion and different ways of counting.
One more issue to be resolved is how we deal with filtered tests and how do we report them. Filtered tests are now being included in reports and during test execution (reported as skipped). For any target we have, the amount of filtered tests is large for good reasons and including filtered tests in console and final reports give the wrong impression about why this large number of tests being skipped. We want to only report tests that are planned and provide a different way for viewing filter status and filter reports and avoid confusion between a test that was filtered and a test that was skipped as part of the execution.
Proposed change
Detailed RFC
The following states will be defined:
passed: A test is deemed to pass if its actual result matches its expected result.
failed: A test is deemed to fail if its actual result does not match its expected result.
error: A test produces the Error result when there is a problem in running the test itself.
skipped: A step that was skipped during a test script execution.
blocked: A test case that is scheduled to run but cannot run because of an error or a crash in the suite.
not run: A test case or test suite that is not yet run. This is the initial state of any testcase.
in progress: A test case was started and it is currently executing.
Using the states above, the following changes are proposed (The list is no way complete, use this RFC to add more):