Open alyaothman14 opened 1 year ago
This is an idea similar to what Knapsack does for Cypress tests. It collects timings on all tests, creates a report and then uses timings on future test runs to load balance. If Playwright has a way to generate a test timings report and balance tests across shards from that report, it would be possible to consistently have optimized test run times.
A few additional considerations:
This is also part of the Cypress Dashboard Service
Right now on one project with 800+ test files, we use BullMQ with an externally-hosted Redis server to dynamically allocate tests to worker machines.
Instead of pre-allocating test tasks to workers at the beginning, we spawn a number of worker machines, and each of them pulls the tasks from the central queue. This helps ensure that resources are best utilized.
We also collect statistics how long each test file takes. We then use that info to re-order the tasks, so that slowest tests are pulled from the queue first. This way, towards the end only fast tests are remaining, and that helps even out the duration between each worker.
Having support for pull queues would really help in a large project.
@dtinth it would be great if you could share some of your code!
@unlikelyzero Most of the code is tightly coupled to our CI/CD setup, Redis instance, and test result database, and so can't be shared easily. But I can share the general approach. Fortunately most of the custom code is just glue code and most of the heavy lifting is done by BullMQ.
In GitHub Actions, I first determine the queue ID and then spawn 20x jobs in a matrix. Each job runs these scripts in order: The enqueue jobs script and work on jobs script. Afterwards, the report script runs at the end.
The enqueue jobs script performs a glob of all test files and adds them as a job to the queue. Each filename is hashed to get a job ID, which is used for job de-duplication. So, 20 workers can all try to add the same job to the queue, and thanks to deduplication, it will only be added once.
The work on tasks script launches a BullMQ worker. The worker just runs the test file and reports the status back to BullMQ. Once the 'drained'
event is emitted, it means that there are no more tests to run, and the worker can exit.
After all tests are finished, the report script will query all jobs from the queue (BullMQ persists all jobs until it is deleted) and generate a report, set the CI status, and update the test results database with their durations. It also downloads all the uploaded artifacts.
There is an overhead of spawning a test runner for each file, but that is outweighed by getting the ability to run 20 workers simultaneously.
I hope it helps!
Thanks @dtinth ! 🥰
Ionic Framework's CI process would benefit greatly from this feature. Many of our tests rely on screenshot comparisons or simulating user interaction, so the runtime adds up. We use the sharding feature to distribute tests over n
test runners which helps, but the run time of each test runner is uneven. As a result, some test runners take 3 minutes and other can take 13 minutes. Having a way of ensuring that each test runner takes (roughly) the same amount of time would noticeably reduce our CI run times.
We did something similar and could reduce the overall duration by -30%. Indeed there's an an overhead of spawning a test runner for each file, but if your testing suite has many spec files, the overall benefit is quite good!
https://currents.dev/readme/guides/pw-parallelization/playwright-orchestration
Let us know what functionality you'd like to see in Playwright and what your use case is. Currently, we have some shards taking longer than others just because PW runs tests in alphabetical order, sometimes causing some test cases to take longer than others. I would love to have load balancing for test cases.
Ideas:
Do you think others might benefit from this as well? Faster execution time