Closed williambrandler closed 2 years ago
Merging #491 (ea23c3b) into master (5552640) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #491 +/- ##
=======================================
Coverage 93.66% 93.66%
=======================================
Files 95 95
Lines 4875 4875
Branches 457 457
=======================================
Hits 4566 4566
Misses 309 309
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5552640...ea23c3b. Read the comment docs.
Signed-off-by: William Brandler william.brandler@databricks.com
What changes are proposed in this pull request?
The Glow continuous integration notebook tests currently times out after five hours. This is because each notebook is run sequentially on a new cluster.
Workflows with multiple tasks By orchestrating the workflow with multiple tasks and cluster reuse, this workflow finishes in less than 1 hour. The multiple task jobs were defined manually in the databricks UI and exported as json
docs/dev/multitask-integration-test-config.json
.Important when you export a json from multitask jobs, please remove
settings{ }
from the json to avoid this error:"error_code":"INVALID_PARAMETER_VALUE","message":"Job settings must be specified."
Github CI/CD integration with Repos Notebooks are now synced directly from the Glow Github Repository using Repos rather than uploading them with the Databricks CLI into a temporary directory. This will make it easier in the future to integrate with Terraform to do entire deployments of Databricks with Glow pipelines predefined and set up ready to go.
With this new setup, notebook tests will run on your branch of your fork.
Note: this new integration test does not yet include all notebooks in the Glow repository (only 20 / 36). And it uses four different cluster configurations (see screenshots below)
Future work: Some of the notebooks in the repository are now redundant, these will be removed in future. And other notebooks will be included into the integration test (such as VEP & liftOver).
The next work will be to optimize the cluster configurations and workflow for UK Biobank scale test data with one phenotype. Then for multiple phenotypes.
How is this patch tested?