Glow integration test with multitask jobs and git repos integration

Signed-off-by: William Brandler william.brandler@databricks.com

What changes are proposed in this pull request?

The Glow continuous integration notebook tests currently times out after five hours. This is because each notebook is run sequentially on a new cluster.

Workflows with multiple tasks By orchestrating the workflow with multiple tasks and cluster reuse, this workflow finishes in less than 1 hour. The multiple task jobs were defined manually in the databricks UI and exported as json docs/dev/multitask-integration-test-config.json.

Important when you export a json from multitask jobs, please remove settings{ } from the json to avoid this error: "error_code":"INVALID_PARAMETER_VALUE","message":"Job settings must be specified."

Github CI/CD integration with Repos Notebooks are now synced directly from the Glow Github Repository using Repos rather than uploading them with the Databricks CLI into a temporary directory. This will make it easier in the future to integrate with Terraform to do entire deployments of Databricks with Glow pipelines predefined and set up ready to go.

With this new setup, notebook tests will run on your branch of your fork.

Note: this new integration test does not yet include all notebooks in the Glow repository (only 20 / 36). And it uses four different cluster configurations (see screenshots below)

Future work: Some of the notebooks in the repository are now redundant, these will be removed in future. And other notebooks will be included into the integration test (such as VEP & liftOver).

The next work will be to optimize the cluster configurations and workflow for UK Biobank scale test data with one phenotype. Then for multiple phenotypes.

Screen Shot 2022-02-25 at 4 42 15 PM

How is this patch tested?

[x ] Integration tests

projectglow / glow

Glow integration test with multitask jobs and git repos integration #491

What changes are proposed in this pull request?

How is this patch tested?

Codecov Report