Enable parallelExecution for integration test suites

qianheng-aws commented 2 days ago

Description

Enable parallel integration.

Based on the metrics collected:

total time cost: 1h09m, test suites: 125, test cases: 1674
Cost Range: 0 min - 1 min: 109 test suites, total cost 476 sec
Cost Range: 1 min - 2 min: 5 test suites, total cost 410 sec
Cost Range: 2 min - 3 min: 3 test suites, total cost 469 sec
Cost Range: 3 min - 4 min: 7 test suites, total cost 1482 sec
Cost Range: 6 min - 7 min: 1 test suites, total cost 407 sec

The time cost of each suite is somehow faired. Most of test suites cost less than 1min and maximum cost is no more than 7 mins.

To reduce test execution time, we should increase parallelism, especially since we don't have any long-running test suites and all tests currently run sequentially.

TODO: There is another thought to reduce the average testing time for each suites is reusing the docker container among suites. It cost around 10 secs to bootstrap a container for OpenSearch. It will save 10 minutes if running integration(65 suites currently) in sequence.

There are 2 ways to increase parallelism:

Option1: Enable SBT's parallel execution in one node. Pros: Easy to implement Cons: Increase pressure on the building node, has possibility to make integ-test unstable if too much parallelism. It will launch at most 4(CPU cores of building node) docker containers and JVM. This optimization has upper bound limited by the performance of building node.

Option2: Add more nodes in CI and distribute tests equally to these nodes. Pros: Can scaling as many building node as possible if we want. Cons: Increase the complexity of the CI workflow since we're going to distribute tests to different building nodes and so need to merge their reports when all nodes have finished their tasks in the end. And it will also increase our spending on CI resources since we will use more building nodes.

These 2 options are compatible and can apply both of them if we want. Take option1 as the first step, as it can save resource and won't increase the workflow's complexity.

Option1 Test, time cost of integ-test recording: baseline -> 1h 3m 35s 4 groups -> 32m 17s 3 groups -> 37m 58s

Try to shuffle tests before splitting into groups: 4 groups with shuffle -> 32m 42s 3 groups with shuffle -> 38m 37s

Check List

[ ] Updated documentation (docs/ppl-lang/README.md)
[ ] Implemented unit tests
[ ] Implemented tests for combination with other commands
[ ] New added source code should include a copyright header
[x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

YANG-DB commented 1 day ago

@qianheng-aws this is a great idea ! Is there any down side for this (option 1 vs option 2) parallelism ?

qianheng-aws commented 1 day ago

@qianheng-aws this is a great idea ! Is there any down side for this (option 1 vs option 2) parallelism ?

Here is the pros and cons comparing these 2 options, and also added it in the description: Option1: Enable SBT's parallel execution in one node. Pros: Easy to implement Cons: Increase pressure on the building node, has possibility to make integ-test unstable if too much parallelism. It will launch at most 4(CPU cores) docker containers and JVM. This optimization has upper bound limited by the performance of building node.

Option2: Add more nodes in CI and distribute tests equally to these nodes. Pros: Can scaling as many building node as possible if we want. Cons: Increase the complexity of the CI workflow since we're going to distribute tests to different building nodes and so need to merge their reports when all nodes have finished their tasks in the end. And it will also increase our spending on CI resources since we will use more building nodes.

opensearch-project / opensearch-spark

Enable parallelExecution for integration test suites #934

Description

Related Issues

Check List