kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.48k stars 873 forks source link

Document Kedro compatibility with workflow tools like Hatch, PDM, Rye, Poetry #3974

Open galenseilis opened 5 days ago

galenseilis commented 5 days ago

Description

I'm sometimes frustrated when managing project dependencies and virtual environments, especially as project complexity grows. Traditional tools like pip and venv can be cumbersome and lack advanced features for dependency resolution (although pip is better than it used to be), version management, and project configuration. This often leads to conflicts and inefficiencies.

I would like Kedro to support modern package managers such as Hatch, PDM, Rye, or Poetry. These tools offer robust dependency management, streamlined environment setup, and enhanced configuration capabilities that can greatly improve the developer experience and productivity.

While it's possible to manually configure these package managers alongside Kedro, native support would ensure seamless integration and reduce the overhead associated with maintaining separate configurations. This would also help standardize the development workflow across teams.

Although I don't think it is for everyone yet, I have 'really' enjoyed using Rye.

Context

This change is important to me because it simplifies dependency management, reduces configuration overhead, and enhances the overall developer experience. By using modern package managers like Hatch, PDM, Rye, or Poetry with Kedro, I can:

For example, PDM's (and others) built-in support for CLI tools that modify pyproject.toml simplifies dependency declaration and management, while its lockfile ensures reproducibility. This is particularly beneficial in collaborative environments where consistency is crucial.

How it can benefit other users:

Overall, integrating support for these package managers would align Kedro with modern Python development practices and significantly enhance its usability for a broad range of users.

Possible Implementation

None

Possible Alternatives

astrojuanlu commented 4 days ago

Hi @galenseilis, thanks for opening this issue!

Luckily, Kedro is already compatible with all PEP 621-compliant tools, and also with Poetry. I have personally enjoyed using PDM for most of my personal Kedro projects for a while.

there are 2 ways to go about this:

  1. From an existing project created with a normal kedro new (hence our official starters, using setuptools at the time of writing), then either change the [build-system] table manually, or use your desired workflow tool (for example pdm init as explained in https://pdm-project.org/en/stable/usage/project/#import-the-project-from-other-package-managers)
  2. Create your own Kedro starter that uses your desired workflow tool instead of setuptools

If going for the latter, the workflow would be

$ pdm init
Creating a pyproject.toml for PDM...
Please enter the Python interpreter to use
 0. cpython@3.12 (/Users/juan_cano/Projects/QuantumBlackLabs/kedro-init/.tmp/.venv/bin/python)
 1. cpython@3.12 (/Users/juan_cano/Projects/QuantumBlackLabs/kedro-init/.tmp/.venv/bin/python3.12)
 2. cpython@3.11 (/Users/juan_cano/Projects/QuantumBlackLabs/tmp/ml_observability_course/.venv/bin/python3.11)
 3. cpython@3.9 (/usr/bin/python3)
 4. cpython@3.12 (/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/bin/python3.12)
Please select (0): 
Project name (tmp): spaceflights-pdm
Project version (0.1.0): 
Do you want to build this project for distribution(such as wheel)?
If yes, it will be installed by default when running `pdm install`. [y/n] (n): y
Project description (): Spaceflights Kedro project, using PDM
Which build backend to use?
0. pdm-backend
1. setuptools
2. flit-core
3. hatchling
Please select (0): 
License(SPDX name) (MIT): None
Author name (Juan Luis Cano Rodríguez): 
Author email (juan_luis_cano@mckinsey.com): 
Python requires('*' to allow any) (>=3.12): >=3.9
Project is initialized successfully
$ tree
.
├── README.md
├── __pycache__
├── pyproject.toml
├── src
│   └── spaceflights_pdm
│       └── __init__.py
└── tests
    └── __init__.py

5 directories, 4 files
$ cat pyproject.toml 
[project]
name = "spaceflights-pdm"
version = "0.1.0"
description = "Spaceflights Kedro project, using PDM"
authors = [
    {name = "Juan Luis Cano Rodríguez", email = "juan_luis_cano@mckinsey.com"},
]
dependencies = []
requires-python = ">=3.9"
readme = "README.md"
license = {text = "None"}

[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

[tool.pdm]
distribution = true

then:

$ kedro-init .
[02:16:47] Looking for existing package directories                                                                                                       cli.py:25
[02:16:53] Initialising config directories                                                                                                                cli.py:25
           Creating modules                                                                                                                               cli.py:25
           🔶 Kedro project successfully initialised!

And you're all set!

$ kedro registry list
- __default__

We encourage the community to create Poetry, PDM, Rye starters (and give my kedro-init a try if you so desire).

Probably evolving our official starters to use an alternative workflow tool isn't going to happen any time soon (until, of course, The One Tool Everybody Uses emerges 😉) so in principle I would say there is not much else for us to do, except perhaps document this better.

What do you think @galenseilis?

galenseilis commented 4 days ago

Hi @galenseilis, thanks for opening this issue!

Luckily, Kedro is already compatible with all PEP 621-compliant tools, and also with Poetry. I have personally enjoyed using PDM for most of my personal Kedro projects for a while.

there are 2 ways to go about this:

1. From an existing project created with a normal `kedro new` (hence our official starters, using setuptools at the time of writing), then either change the `[build-system]` table manually, or use your desired workflow tool (for example `pdm init` as explained in https://pdm-project.org/en/stable/usage/project/#import-the-project-from-other-package-managers)

2. Create your own Kedro starter that uses your desired workflow tool instead of `setuptools`

   * If you're in a hurry, you can initialise your project using your workflow tool of choice and then use my `kedro-init` plugin https://pypi.org/project/kedro-init

If going for the latter, the workflow would be

$ pdm init
Creating a pyproject.toml for PDM...
Please enter the Python interpreter to use
 0. cpython@3.12 (/Users/juan_cano/Projects/QuantumBlackLabs/kedro-init/.tmp/.venv/bin/python)
 1. cpython@3.12 (/Users/juan_cano/Projects/QuantumBlackLabs/kedro-init/.tmp/.venv/bin/python3.12)
 2. cpython@3.11 (/Users/juan_cano/Projects/QuantumBlackLabs/tmp/ml_observability_course/.venv/bin/python3.11)
 3. cpython@3.9 (/usr/bin/python3)
 4. cpython@3.12 (/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/bin/python3.12)
Please select (0): 
Project name (tmp): spaceflights-pdm
Project version (0.1.0): 
Do you want to build this project for distribution(such as wheel)?
If yes, it will be installed by default when running `pdm install`. [y/n] (n): y
Project description (): Spaceflights Kedro project, using PDM
Which build backend to use?
0. pdm-backend
1. setuptools
2. flit-core
3. hatchling
Please select (0): 
License(SPDX name) (MIT): None
Author name (Juan Luis Cano Rodríguez): 
Author email (juan_luis_cano@mckinsey.com): 
Python requires('*' to allow any) (>=3.12): >=3.9
Project is initialized successfully
$ tree
.
├── README.md
├── __pycache__
├── pyproject.toml
├── src
│   └── spaceflights_pdm
│       └── __init__.py
└── tests
    └── __init__.py

5 directories, 4 files
$ cat pyproject.toml 
[project]
name = "spaceflights-pdm"
version = "0.1.0"
description = "Spaceflights Kedro project, using PDM"
authors = [
    {name = "Juan Luis Cano Rodríguez", email = "juan_luis_cano@mckinsey.com"},
]
dependencies = []
requires-python = ">=3.9"
readme = "README.md"
license = {text = "None"}

[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

[tool.pdm]
distribution = true

then:

$ kedro-init .
[02:16:47] Looking for existing package directories                                                                                                       cli.py:25
[02:16:53] Initialising config directories                                                                                                                cli.py:25
           Creating modules                                                                                                                               cli.py:25
           🔶 Kedro project successfully initialised!

And you're all set!

$ kedro registry list
- __default__

We encourage the community to create Poetry, PDM, Rye starters (and give my kedro-init a try if you so desire).

Probably evolving our official starters to use an alternative workflow tool isn't going to happen any time soon (until, of course, The One Tool Everybody Uses emerges 😉) so in principle I would say there is not much else for us to do, except perhaps document this better.

What do you think @galenseilis?

This makes sense to me! I agree with your conclusion that documenting the compatability with these tools where applicable is the way to go forward. :)

galenseilis commented 3 days ago

I did not encounter any major issues with setting up Kedro with Rye.

https://galenseilis.github.io/posts/kedro-init-rye/