[Feature] Better way to load balance between testcases

alyaothman14 commented 1 year ago

Let us know what functionality you'd like to see in Playwright and what your use case is. Currently, we have some shards taking longer than others just because PW runs tests in alphabetical order, sometimes causing some test cases to take longer than others. I would love to have load balancing for test cases.

Ideas:

Allow for the slowest test to run first across all available shards to run first by collecting data and producing a report that can be used in subsequent runs to get a better load balance.

Do you think others might benefit from this as well? Faster execution time

schlenks commented 1 year ago

This is an idea similar to what Knapsack does for Cypress tests. It collects timings on all tests, creates a report and then uses timings on future test runs to load balance. If Playwright has a way to generate a test timings report and balance tests across shards from that report, it would be possible to consistently have optimized test run times.

A few additional considerations:

A report that estimates the impact on adjusting shards is really helpful. IE If you currently had 4 shards used, 3 shards +3 minutes overall time, 5 shards -1 minute overall time, 6 shards -3 minutes overall time, 7 shards -4 minutes overall time.
This report would also be really helpful for highlighting what the slowest tests are in the test suite.

unlikelyzero commented 1 year ago

This is also part of the Cypress Dashboard Service

dtinth commented 1 year ago

Right now on one project with 800+ test files, we use BullMQ with an externally-hosted Redis server to dynamically allocate tests to worker machines.

Instead of pre-allocating test tasks to workers at the beginning, we spawn a number of worker machines, and each of them pulls the tasks from the central queue. This helps ensure that resources are best utilized.

We also collect statistics how long each test file takes. We then use that info to re-order the tasks, so that slowest tests are pulled from the queue first. This way, towards the end only fast tests are remaining, and that helps even out the duration between each worker.

Having support for pull queues would really help in a large project.

unlikelyzero commented 1 year ago

@dtinth it would be great if you could share some of your code!

dtinth commented 1 year ago

@unlikelyzero Most of the code is tightly coupled to our CI/CD setup, Redis instance, and test result database, and so can't be shared easily. But I can share the general approach. Fortunately most of the custom code is just glue code and most of the heavy lifting is done by BullMQ.

In GitHub Actions, I first determine the queue ID and then spawn 20x jobs in a matrix. Each job runs these scripts in order: The enqueue jobs script and work on jobs script. Afterwards, the report script runs at the end.
- Every worker in the same workflow should use the same queue ID. Workers from different workflow should use a different queue ID.
The enqueue jobs script performs a glob of all test files and adds them as a job to the queue. Each filename is hashed to get a job ID, which is used for job de-duplication. So, 20 workers can all try to add the same job to the queue, and thanks to deduplication, it will only be added once.
- Jobs are dequeued in the same order that they are added. So the test result database can be queried to find out the duration of tests that run in the past. This duration can be used to sort the tests so that longer tests gets executed first and only short tests get run at the end, evening out the durations.
The work on tasks script launches a BullMQ worker. The worker just runs the test file and reports the status back to BullMQ. Once the 'drained' event is emitted, it means that there are no more tests to run, and the worker can exit.
- There are several open source UI for BullMQ, so you can deploy a BullMQ dashboard and use it to track its progress in real-time.
- The test report for each job is written to different directories and uploaded to an artifact storage. If you use GitHub Actions, uploading multiple small files will be smaller than uploading one big file. So in our setup, we zip up all the generated artifacts and upload them after the worker has finished processing testing jobs.
After all tests are finished, the report script will query all jobs from the queue (BullMQ persists all jobs until it is deleted) and generate a report, set the CI status, and update the test results database with their durations. It also downloads all the uploaded artifacts.
- It is important that the report contains the information that can identify which worker ran which test. Having that info will become very helpful for debugging when tests failed.

There is an overhead of spawning a test runner for each file, but that is outweighed by getting the ability to run 20 workers simultaneously.

I hope it helps!

emperol2 commented 1 year ago

Thanks @dtinth ! 🥰

liamdebeasi commented 11 months ago

Ionic Framework's CI process would benefit greatly from this feature. Many of our tests rely on screenshot comparisons or simulating user interaction, so the runtime adds up. We use the sharding feature to distribute tests over n test runners which helps, but the run time of each test runner is uneven. As a result, some test runners take 3 minutes and other can take 13 minutes. Having a way of ensuring that each test runner takes (roughly) the same amount of time would noticeably reduce our CI run times.

agoldis commented 4 months ago

We did something similar and could reduce the overall duration by -30%. Indeed there's an an overhead of spawning a test runner for each file, but if your testing suite has many spec files, the overall benefit is quite good!

https://currents.dev/readme/guides/pw-parallelization/playwright-orchestration

microsoft / playwright

[Feature] Better way to load balance between testcases #20116