microsoft / nutter

Testing framework for Databricks notebooks
MIT License
288 stars 42 forks source link

Proposal: parallel test execution #58

Closed tomconte closed 1 year ago

tomconte commented 3 years ago

This is a proposal to implement a new helper class that would allow executing tests in parallel from a test notebook.

The use case: we have a fairly large amount of tests to run (40) and each test takes about 3 minutes to run. We would like to run them in parallel. However, if we use the client-side parallelisation using the CLI, we need to create 40 notebooks (one per test), which is not very maintainable. It would be easier for us to have a single test notebook, that executes the tests in parallel on the cluster side.

Here is an idea of how the helper class could be used:

from runtime.runner import NutterRunner

all_tests = []

for d in test_data:
  if 'test1' in d:
    test = TestNotebookForTest1(d, other_params)
  elif 'test2' in d:
    test = TestNotebookForTest2(d, other_params)
  else:
    test = TestNotebookForOtherTest(d, other_params)
  all_tests.append(test)

# Run 8 tests in parallel
runner = NutterRunner(all_tests, 8)
all_results = runner.execute_tests()

print(all_results.to_string())

The signature for the constructor could look like this:

class NutterRunner(object):
    def __init__(self, tests, num_of_workers=1):
      # ...

Under the hood, the helper class would leverage the existing Scheduler class which already has an implementation of parallel workers.

Feedback welcome!