zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.71k stars 6.54k forks source link

Improve twister performance when parallel execution is available #52701

Open yperess opened 1 year ago

yperess commented 1 year ago

Is your enhancement proposal related to a problem? Please describe. In our test writing we have an issue where creating a new variant of a test (a new binary) has a lot of boilerplate overhead + build times but the cost of piling yet another test is also getting too much. The test binaries end up executing hundreds of tests and taking a long time.

Describe the solution you'd like I'd like twister to be able to take the final built .elf file and shard it. Effectively, taking the elf file and modifying the ztest suite and test iterable sections, running the different shards in parallel, then combining the results. When running twister, the following steps should take place:

  1. Twister should build the binary as usual
  2. Twister should identify if parallelism is possible
  3. Twister should parse the .elf file and generate test metadata by iterating over the suite and test sections
  4. Twister should assign a weight to each suite based on the number of tests it has, then make some attempt at balancing the loads (I don't think we need to worry about splitting up suites yet).
  5. Twister should copy the .elf file and mutate the start/end pointers as well as shifting some suite data such that the binary is unaware of the other suites.
  6. Twister should run the various .elf files (locally, on QEMUs, or on hardware) in parallel then join the results

Describe alternatives you've considered I've considered having an easier way of specifying a similar binary in the testcase.yaml file, but the only way I seem to be able to do that is by introducing a Kconfig to select which test suites to include into the binary, this ends up being a little confusing and forces the test writers to have to manage the test binaries by hand.

yperess commented 1 year ago

@tristan-google

gmarull commented 1 year ago

@PerMac isn't this something twister V2 already supports using https://pypi.org/project/pytest-parallel/ ?

tristan-google commented 1 year ago

I think Yuval's idea here doesn't concern the handlers for actually running stuff in parallel but rather the ability to take a giant testcase binary and sharding it post-build but pre-test runtime. So the executable would be duplicated N times and each copy would be modified to only run roughly 1/N of the tests, where N can be derived from the number of cores, number of matching HW boards plugged in, etc. At that point those N copies could be run in parallel through whatever handler/mechanism is in place just as if they were independent testcases.

PerMac commented 1 year ago

I am not sure if I get the full idea. Would your idea require:

PerMac commented 1 year ago

Another question: would there be a place in ztest framework to allow communication and calling single tests? E.g something like

dut.flash('zephyr.elf'')
dut.write('ztest call testcase_A')
output = dut.read()
assert output == "testcase_A PASS"

Where dut.write, dut.read would be handled in twister for serial communication with dut, and the flashed ztest application would handle what and how to call, when a command on the serial input is given?

yperess commented 1 year ago

@PerMac not quite, Twister doesn't need to communicate with the dut. Prior to flashing, twister will identify if parallelism is possible, that means:

If any of the above is a yes, then we can parallelize the execution of the binary. Lets assume the binary has 30 test suites and each suite has 10 tests for a total of 300 tests (I believe we're just over that for our largest integration test binary). Twister would build the test binary as usual, then decide "how parallel things can be". For example, in the case of a native_posix test we can parallelize by the number of cores (lets assume a high number running on a CI server). So Twister:

  1. Copies the .elf file 30 times (since min(30 suites, 96 cores)).
  2. For each .elf copy it mutates zTest's _ztest_suite_node_list_start and _ztest_suite_node_list_end so that the binary is completely unaware of the other tests.
  3. It would then run each of the 30 binaries in parallel giving the results for the 1 suite
  4. It would then combine the results.

Some consideration would need to be taken for QEMUs and DUTs as the cost of flashing becomes greater (it might not make sense to split a binary with only 2 suites on a DUT test) but I believe we can tweak these heuristics as we get closer to feature complete.

NOTE

We have some very large integration tests and currently developers have to choose between the convenience of adding their test to the same binary (which bloats it even more) or go through the boilerplate of creating another binary for their test (with no real configuration changes). This leads to a very large discrepancy in run times where locally I'm seeing some tests run in milliseconds while our largest 2 tests run in 65 seconds. The issue is even worse in our CI where disk I/O is slower and the large handler.log writing is the bottleneck pushing those larger binaries closer to 120 seconds.

zephyrbot commented 8 months ago

Hi @tristan-google,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. Please confirm the issue is correctly assigned and re-assign it otherwise.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@yperess you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!