kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.91k stars 900 forks source link

Rebuild Kedro/Databricks workflow recommendations #2185

Closed yetudada closed 1 year ago

yetudada commented 1 year ago

Description

We concluded a research item on how Kedro is being used on Databricks (#2105). This task makes a recommendation to improve our Deployment to a Databricks cluster documentation.

Context

We will work on a Kedro-Databricks plugin at a later stage but first we'll overhaul the documentation because there was an insight about how much our users rely on it to get their work done. At this point in time, we'll recommend use of dbx and Databricks Repos as a way to use Kedro on Databricks.

Possible Implementation

Our Deployment to a Databricks cluster documentation needs quite a bit of help in the following ways:

jmholzer commented 1 year ago

This parent issue needs to be broken down further:

  1. Define a new workflow with Databricks repos, dbx and kedro
  2. Document our new workflow, make changes to existing documentation
  3. Document recommendations for use of Azure databricks (medium priority)
astrojuanlu commented 1 year ago

I guess only the Azure databricks part is missing?

merelcht commented 1 year ago

All subtasks have now been completed. The remaining work is blog posts and has been removed to kedro-devrel.