kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.49k stars 874 forks source link

Parent task: Content on Kedro vs. other tools #3012

Open merelcht opened 1 year ago

merelcht commented 1 year ago

Description (edit 06/09/2023)

The Kedro docs are missing a clear description about the value proposition of Kedro vs other tools.

Another topic related to this is migration guides about how to go from tool X to Kedro.

Ideas

stichbury commented 1 year ago

Let's compile a list of these "competitor/complementary" platforms.

Category 1:

Category 2:

Category 3:

Category 4:

This is something I'll do this week, I've earmarked some time...

astrojuanlu commented 1 year ago

This is from some recent slide decks.

image
astrojuanlu commented 11 months ago

Evidence that this could be useful for some users (private communication):

like it [Kedro] a lot, it's very versatile and interesting and above all the way it works, when you take the roll it speeds up [the development process] a lot (I think that's its goal, to make it reproducible). What I would like to have clearer is how it fits or differs from mlflow

astrojuanlu commented 10 months ago

I think we should abstain to do blog posts or promotional content about this. People ask very frequently about Kedro vs MLFlow (happened to me last week), Kedro vs dbt (happened to me a minute ago), Kedro vs DVC and this should be more prominently explained in the documentation.

I'm advocating for moving this to https://github.com/kedro-org/kedro/ and raising its priority.

stichbury commented 10 months ago

Sure, let's do this.

@astrojuanlu Could you assist me with the lists. I have this big set of potential tools but need help to decide if they're in group 2 or 4 and also priorities thereof.

astrojuanlu commented 10 months ago

Let's start with MLflow, dbt, DVC. The other ones are smaller and can be tackled at a later stage I think.

stichbury commented 10 months ago

Could you help me categorise since MLflow isn't a comparable tool but a complementary one, for the others. I'll jot down which I think are which and that'll help with deciding on the template for each type of article.

astrojuanlu commented 10 months ago

Notice that MLflow now has MLflow Recipes (previously MLflow Pipelines) https://mlflow.org/docs/latest/recipes.html hence it can be considered a comparable tool.

image

See also the official announcement https://www.databricks.com/blog/2022/06/29/introducing-mlflow-pipelines-with-mlflow-2-0.html

stichbury commented 8 months ago

Also adding smart notebooks viz https://deepnote.com/blog/jupyter-notebook-alternative and https://hex.tech/

astrojuanlu commented 5 months ago

Google's opinion:

image

So let's do:

merelcht commented 3 months ago

Could we take some of the content that @NeroOkwa presented in his competitor analysis for this?

astrojuanlu commented 3 months ago

I think it's much better to focus first on "how to use Kedro and X" (https://github.com/kedro-org/kedro/issues/3012#issuecomment-1903751448) rather than "why to use Kedro instead of X/differences & similarities between Kedro and X" (@NeroOkwa's competitor analysis).

astrojuanlu commented 1 month ago

MLflow is done, Airflow is sufficiently covered in https://docs.kedro.org/en/stable/deployment/airflow.html

I'm shifting my focus to MLOps integrations for the next couple of months before coming back to this. Will add more details later.