kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.82k stars 895 forks source link

Designate a better place for development requirements in Kedro projects #2519

Open astrojuanlu opened 1 year ago

astrojuanlu commented 1 year ago

Description

At the moment, we are telling users in our documentation to place linters in src/requirements.txt:

https://github.com/kedro-org/kedro/blob/e7315a8f07396b2075f116cf0201037033671811/docs/source/development/linting.md?plain=1#L25

But this is weird, becuase they are actually development tools, and they shouldn't be part of src/requirements.txt, which will be packaged along with the project.

Instead, common practice is to either have a dev-requirements.txt, or an "optional dependencies" table in the project metadata (see below).

Context

_(Comes from https://github.com/kedro-org/kedro/pull/2517#discussion_r1165936049)_

Possible Implementation

We'd need

Possible Alternatives

Instead of dev-requirements.txt we could have a dev table in the [project.optional-requirements] section of pyproject.toml (extras_require in old setup.py files). This would be blocked in https://github.com/kedro-org/kedro/issues/2280.

antonymilne commented 1 year ago

For some more background context, this originally came up as a suggestion in https://github.com/kedro-org/kedro/issues/1077. As you can see there, the discussion eventually led to the deprecation of the commands like kedro lint which used the dependencies in question, and so nothing was done about it.

I'm not sure what the latest thinking is on (a) deprecation of these commands and (b) Amanda's opt-in utility approach, but I guess the solution here should take those into account. But yes, I fully agree with you that these don't belong in requirements.txt and shouldn't be packaged with the project, and moving them to dev-requirements.txt would be much better.

Also relevant: https://github.com/kedro-org/kedro/discussions/844.

antonymilne commented 1 year ago

Instead, common practice is to either have a dev-requirements.txt.

What is the other option you were going to mention here? Is it dev table in [project.optional-requirements]?

astrojuanlu commented 1 year ago

Oh yes, left the sentenced unfinished then wrote it down at the end of the comment:

Instead of dev-requirements.txt we could have a dev table in the [project.optional-requirements] section of pyproject.toml (extras_require in old setup.py files). This would be blocked in https://github.com/kedro-org/kedro/issues/2280.

astrojuanlu commented 1 year ago

This is in progress in gh-2569.

astrojuanlu commented 1 year ago

We went back and forth a bit with this, and in the end we closed my initial attempt #2569 and didn't make a final decision.

This is the current contents of requirements.txt for the project template (untouched by the current PR #2853):

https://github.com/kedro-org/kedro/blob/4563a4c609cf808e9087561af5619f143ba3fcff/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/requirements.txt#L1-L14

astrojuanlu commented 10 months ago

Coming back to this after finding that it is not possible to install Airflow with the official instructions and the dependencies of a Kedro project side by side:

❯ pip install "apache-airflow==$AIRFLOW_VERSION" --constraint "$CONSTRAINT_URL" -r src/requirements.txt                        (airflow310-alt) 
Ignoring ipython: markers 'python_version < "3.8"' don't match your environment
Requirement already satisfied: apache-airflow==2.7.2 in /Users/juan_cano/.micromamba/envs/airflow310-alt/lib/python3.10/site-packages (2.7.2)
ERROR: Cannot install black~=22.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested black~=22.0
    The user requested (constraint) black==23.9.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

However, the problem is much less severe in develop, since the linters are gone:

https://github.com/kedro-org/kedro/blob/bf536d4029d94bb318848b150be78d23fca44fb4/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/requirements.txt#L1-L9

The only remaining non-mandatory dependencies there are Jupyter (#2276) and kedro-telemetry (https://github.com/kedro-org/kedro-plugins/issues/375).

merelcht commented 10 months ago

@astrojuanlu If it's just the jupyter and kedro-telemetry dependencies, is it really worth adding a separate dev-requirements.txt file to the template? Or can these be moved to the optional dependencies in pyproject.toml?

astrojuanlu commented 10 months ago

@merelcht I think effectively jupyterlab (or notebook) and kedro-telemetry are optional dependencies, and ideally they should indeed be in the [project.optional-dependencies] table of pyproject.toml, as you say.

However, this would mean that users would need to do pip install -e .[extra] to install those extra dependencies. Almost every Python user knows how to do pip install -r requirements.txt, but pip install -e . is a bit more arcane and has only been working unconditionally[^1] as of very recently (PEP 662 and deprecations of direct setup.py invocations happened 2 years ago, or even less). We anticipated this in my first iteration of https://github.com/kedro-org/kedro/pull/2569 and decided to keep the requirements.txt workflow for now. But that's what keeps us from making progress on this task.

Long story short: the moment we take kedro-telemetry and jupyter out of requirements.txt, some users will get confused that they're not installed. And on top of that, there's a the "political" problem of how users get kedro-telemetry installed.

[^1]: Unconditionally = For every PEP 517 compatible build backend and for every PEP 621 compatible pyproject.toml. I consider everything before that "the dark ages" of Python packaging.

astrojuanlu commented 10 months ago

This is blocked on https://github.com/kedro-org/kedro-plugins/issues/375.