Open lrcouto opened 6 days ago
I wonder if we can invoke cookiecutter via pipx
it's literally only needed once
Notice that both kedro new
and kedro pipeline create
use cookiecutter
, but refactoring the former is much more difficult than refactoring the latter. So, on @lrcouto ideas for solutions, we could account for the fact that maybe we could make kedro pipeline create
not dependent on cookiecutter
, and focus on what to do with kedro new
.
I cannot join today Tech Design and I will watch the recording. I leave some comment on the issue to clarify:
The only way we can currently run Kedro without needing Rich is by downgrading Cookiecutter to a version before they themselves added Rich as one of their dependencies, which is hacky and not ideal.
cookiecutter is not needed as a "runtime" dependencies, by runtime I mean kedro run . If user still need to use kedro new or kedro pipeline createthen cookiecutter is needed.
To me the problem right now it that user cannot INSTALL kedro without installing cookiecutter, thus either solutions that I propose can address this with different tradeoff (see the summary):
Replace cookiecutter
I will not consider this option unless we aim as expanding the feature. For example, there has been quite a lot of issue running kedro new in databricks (network, permission issues). Do we have alternative that can handle this better?
How would a possible split in two packages, or having one install option with extra dependencies, affect our user experience?
This is explains mostly in Spike: Make cookiecutter optional / not a core dependency of kedro
Pro:
Con:
Pro:
Con:
One last idea - pip
vendors certain tools (like rich
) so there is no risk of conflicts. Maybe that's what we need to do here?
https://github.com/pypa/pip/tree/main/src/pip/_vendor
Here's the summary of what we discussed on the Tech Design session on Jun 26th:
kedro run
.kedro[new]
" approach, separating the non-core features of Kedro into an optional install. Relatively easy to implement and wouldn't affect current users, but would be a breaking change and possibly confusing to explain to new users.kedro
and kedro-core
. Would not break for existing users, but would complicate our ecosystem and be more complicated to debug as well. To clarify on the two packages solution, there are 2 approaches:
kedro
and kedro-slim
, aka the FastAPI approach as described by @noklam herebasically
fastapi
andfastapi-slim
does not rely on each other. They are essentially duplicate but standalone packages as I understand.
Indeed, they're generated from the same codebase but they don't depend on each other, see https://github.com/tiangolo/fastapi/pull/11503. Compare https://pyoven.org/package/fastapi with https://pyoven.org/package/fastapi-slim .
kedro
depending on kedro-core
, aka the Dask Conda approach:There is https://anaconda.org/conda-forge/dask, depending on dask-core
, distributed
, pandas
etc (hence equivalent to pip install dask[complete]
) and https://anaconda.org/conda-forge/dask-core, with minimal dependencies.
Other packages doing the same:
pydantic
depending on pydantic-core
https://github.com/conda-forge/pydantic-feedstock/blob/main/recipe/meta.yaml#L29poetry
depending on poetry-core
https://github.com/conda-forge/poetry-feedstock/blob/main/recipe/meta.yaml#L32flit
depending on flit-core
https://github.com/conda-forge/flit-feedstock/blob/main/recipe/meta.yaml#L26The "kedro[new]
" approach would then be similar to the Dask PyPI approach.
Thanks you so much for the great write-up of the problem and the discussion summary @lrcouto 👏 ⭐
I'd like to look at this with a short-term and long-term solution view.
1.0.0
redesign like @deepyaman proposed in the tech design meeting. Aside from these two solutions, we might need to find an alternative for cookiecutter if it is indeed being maintained less and less. I don't think that necessarily solves any of our issues though, because it would just replace the cookiecutter
dependency with e.g. copier
and there's a chance that any replacement introduces Rich again at some point. So although this is related, I wouldn't consider replacing cookiecutter a solution for anything other than making sure we use up to date packages as dependencies.
I am leaning towards separating Kedro in two packages as a solution as well. Out of those, I think having kedro
depending on kedro-core
is my favorite. It would be a big endeavor to implement, but I think it would prevent this kind of issue from happening in the future as well. We could keep kedro-core
as lean as possible, having only what's strictly necessary for kedro run
, and have other amenities and extra features on the larger kedro
packages.
The original issue: Kedro has a lot of dependencies
Attempting to remove Rich
The Cookiecutter Issue
kedro new
onwards is building up a data structure to be passed as a parameter to thecookiecutter()
function, which handles the creation itself from the desired template.Current ideas for solutions
pip install kedro[new]
(https://github.com/kedro-org/kedro/issues/3884#issuecomment-2175031422)kedro
andkedro-core
, letting the user choose which one fits their needs. (Also https://github.com/kedro-org/kedro/issues/3884#issuecomment-2175031422)Further questions to discuss