Closed jmholzer closed 1 year ago
To test this:
kedro new --starter git+https://github.com/kedro-org/kedro-starters.git --directory databricks-iris --checkout feat/modify-pyspark-iris-databricks-packaged-deployment
To test this:
kedro new --starter git+https://github.com/kedro-org/kedro-starters.git --directory databricks-iris --checkout feat/modify-pyspark-iris-databricks-packaged-deployment
Thanks for figuring this out @astrojuanlu!
Motivation and Context
The guide on deploying packaged projects to Databricks proposed in https://github.com/kedro-org/kedro/pull/2595 uses the
databricks-iris
starter. This PR adds this starter. Thedatabricks-iris
starter is a duplicate of thepyspark-iris
starter with a few changes.databricks_run.py
: a module for running the project on Databricks, as Click causes us to be unable to run projects with the default entry point on Databricks.conf/base/logging.yml
).conf/base/catalog.yml
are saved in/dbfs/FileStore
.This PR has a large diff because it is a brand new starter, only the following files have been changed from
pyspark-iris
:{{ cookiecutter.repo_name }}/src/setup.py
: contains an entry point definitiondatabricks_run
.{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py
: contains a script needed to run a packaged Kedro project on Databricks.{{ cookiecutter.repo_name }}/src/conf/base/logging.yml
: config for writing logs to DBFS.{{ cookiecutter.repo_name }}/src/conf/base/catalog.yml
: points to datasets on DBFS.How has this been tested?
Manually on Databricks in conjunction with the new guide.
Checklist