Closed astrojuanlu closed 2 months ago
A couple more details about the issue:
Our whole project creation flow currently relies on cookiecutter for creating new projects, and for creating pipelines as well. This bumps into the problems mentioned by @astrojuanlu, since cookiecutter is pretty intense when it comes to dependencies.
As of now, cookiecutter is performing three functions on Kedro:
Project templates
This is what advertises itself for. It's a templating tool. We use it to fetch directory structure and pre-created files from both the Kedro repo itself and the kedro-starters repo. Which leads to...
Git operations
To fetch those templates, cookiecutter uses functions that do operations like cloning a repo or checking out a branch. This can become a bit of an issue from the developer side, because we cede control to an external library to do these operations and it can be tricky to debug. For example, this bit on the _create_project()
function:
result_path = cookiecutter(template=template_path, **cookiecutter_args)
From then on, cookiecutter takes over and will do the whole cloning, selecting the right branch or tag of the repo, and creating the project itself. It's convenient, and at this moment our whole project creation from kedro new
on is pretty much preparing data structures that will subsequently be fed to this one function. It effectively acts as a chokepoint, and because it's not ours, debugging it can be a little bit of a pain, having to dive down multiple functions on the cookiecutter library to understand what's going on. They have their own clone/checkout function so we're forced to adapt to the way they want to do it.
Prompting
This was the one that caused the issue when we tried to remove Rich as a mandatory dependency. That one cookiecutter
function is also handling our prompts, we just pass which ones we require for that specific project creation on that **cookiecutter_args
structure. That function is essentially equivalent to a CLI call to cookiecutter with whichever arguments you want to use, so it makes sense.
The TL;DR is that our whole project creation flow is a funnel that ends on cookiecutter.
Thanks for the comprehensive write-up @lrcouto ! So am I correct in thinking that if a user already has a Kedro project (e.g. cloned from a repo) they can do everything they want in Kedro without needing cookiecutter?
Thanks for the comprehensive write-up @lrcouto ! So am I correct in thinking that if a user already has a Kedro project (e.g. cloned from a repo) they can do everything they want in Kedro without needing cookiecutter?
We're also using it to create new pipelines using the kedro pipeline create
command. But as long as they have the project and all of the pipelines they're going to use, they would not need cookiecutter anymore.
In other words, if we consider kedro new
or kedro pipeline create
not a "core" functionalities, we can move it out from the core dependencies.
The tricky bit here is for 0.19, we need to maintain backward compatibility. In 0.20, we can definitely split it to kedro[all]
and move cookiecutter
out from core.
I have test it today, after pip uninstall cookiecutter
, I can perform these common Kedro's action
In fact, ipython
and jupyter
are not Kedro's core dependencies, we could handle cookiecutter in similar way. We try to intercept this error in CLI, since all prompts are CLI. i.e.
For the records, there are proposal for optional opt-out
dependencies (from the Pydantic's author):
https://discuss.python.org/t/optional-dependency-groups-omitting-package-requirements/20833
This problem is also not unique to us, see this quote from the FastAPI's author in the same thread, he has issue with rich
as well:
I just came to chime in and say that I agree this would be useful to me for FastAPI and Typer at least
For example, pip install typer should include Rich and Shellingham by default, and if someone explicitly wants to strip that out they would add some additional syntax (e.g. pip install typer[minimal]).
Long story short, there are plenty of discussion but until today it's not supported (PEP doesn't exist yet). I thought it's a good idea to take inspiration from pydantic
and FastAPI
as they are arguably the most popular libraries in Python ecosystem. It's beneficial if we don't deviate too much from them.
Pydantic:
There were discussion about pydantic-core
and pydantic-core-lite
(a smaller binary), I cannot find pydantic-core-lite
. I suspect it's because of the discussion and there is no satisfying mechanism to support the "opt-out" dependencies, as a result it's not worth the effort to maintain this separately.
FastAPI:
In fastapi[standard], it install fastapi-cli
as a dependency:
https://github.com/tiangolo/fastapi/blob/653315c496304be928b46c34d35097d0ce847646/pyproject.toml#L56-L61
In FastAPI case, it makes sense since it's a web application and the CLI part isn't a runtime requirements when served. For Kedro, the CLI is a core part. If we only split the "cookiecutters" part, it's too small and IMO not worth the effort with little gain.
When one install pip install fastapi
, it installs fastapi-slim[standard]
(I couldn't find the code for fastapi-slim
, it's probably using the same fastapi
code base but packaged separately for convenience). It automatically load the pdm_build.py
with https://backend.pdm-project.org/migration/#setup-script
It is something quite new as well (2months old)
This is my proposed solution:
cookiecutter
out from core and to something like kedro[new]
/kedro[init]
When user run kedro new
:
If they have cookiecutter
installed, it should run as is.
If they don't, it will show a message like:
f"Module '{module_name}' not found. Make sure to install requirements with `pip install kedro[new]`
Docs: We need to update the docs for "Get started", we need to replace pip install kedro
with pip install kedro[new]
Pro: pip install kedro
will work as the same for most people, except those need to create new project. (less often)
Con:
kedro[new]
)kedro
and kedro-core
The FastAPI way of having fastapi
and fastapi-slim
It uses a magic constant to have different build: https://github.com/tiangolo/fastapi/blob/653315c496304be928b46c34d35097d0ce847646/pdm_build.py#L6
It uses pdm
, though I am not sure is that necessary to support these kind of workflow.
Pro:
fastapi-slim
is easier, pip install fastapi
does not install fastapi-slim[standard]
although they are equivalentCon:
fastapi
, it requires the entire ecosystem to support the Big fan of (2), pip install kedro
including ["kedro-core", "cookiecutter", "rich"]
. This retains backwards compatibility (we don't even have to wait until 0.20) and gives an exit path for users who want to install the "slim" or "lightweight" version.
Besides Nok's proposal, another idea to at least stop relying on Rich would be to try to handle the prompting outside of the cookiecutter context. That would allow us to keep everything mostly as it and not have to downgrade the Cookiecutter version if we want to remove Rich. I do think it would be a bit of reinventing the wheel though, since we'd have to create something that Cookiecutter is already doing.
Besides Nok's proposal, another idea to at least stop relying on Rich would be to try to handle the prompting outside of the cookiecutter context. That would allow us to keep everything mostly as it and not have to downgrade the Cookiecutter version if we want to remove Rich. I do think it would be a bit of reinventing the wheel though, since we'd have to create something that Cookiecutter is already doing.
On that idea, @ankatiyar suggested looking into Click's prompting functions to possibly replace Cookiecutter's.
@noklam opened an issue on the Cookiecuter repo regarding their own dependency on Rich.
Big fan of (2), pip install kedro including ["kedro-core", "cookiecutter", "rich"]. I will say this is (3). The most notable difference is that
pip install kedro
will automtically installkedro-core
, the FastAPI approach will not, the benefit is that you can see exactly how many people are downloading the slim package on purpose.
(2) that FastAPI takes is a bit different, basically fastapi
and fastapi-slim
does not rely on each other. They are essentially duplicate but standalone packages as I understand. I am trying to ask why the author takes that approach and the lesson he learnt, one drawback that I found is that it will cause some issues to the plugins. As most likely they will pin kedro
instead of kedro-core
, this make it harder to actually install the lean dependencies.
I do like that both fastapi
and fastapi-slim
share the same codebase so we don't need to maintain a separate one.
Besides Nok's proposal, another idea to at least stop relying on Rich would be to try to handle the prompting outside of the cookiecutter context. That would allow us to keep everything mostly as it and not have to downgrade the Cookiecutter version if we want to remove Rich. I do think it would be a bit of reinventing the wheel though, since we'd have to create something that Cookiecutter is already doing.
On that idea, @ankatiyar suggested looking into Click's prompting functions to possibly replace Cookiecutter's.
@noklam opened an issue on the Cookiecuter repo regarding their own dependency on Rich.
Cookiecutter use click.prompt
before they switch to rich
, the implementation can be found in the https://github.com/cookiecutter/cookiecutter/pull/1901 linked.
See this from the fastapi community. Some arguments I see there:
I get it: plugins depending on kedro
would need to migrate to kedro-core
if they want to drop the unwanted dependencies. And for projects, the moment there is 1 plugin depending on the full kedro
, they're stuck.
And yet, as @/tiangolo explains here:
So, the idea with having the new setup with
fastapi
apart fromfastapi-slim
is to make the simplest developer experience for the majority of cases, in particular for newcomers, as it's much easier to installpip install fastapi
and have the default recommended packages.For a newbie,
pip install fastapi[standard]
means explaining what the[]
means, what is thisstandard
keyword, what is the difference betweenpip install fastapi
andpip install fastapi[standard]
, and in many cases that command won't even work, because people might be using Zsh (e.g. in Mac by default) where it needs quotes aroundpip install "fastapi[standard]"
, so, what should be the simplest use case for the newcomers becomes the most complex one, more difficult to explain and use.
He got quite a bit of opposition, but I completely agree with his stance (and Sebastián's intuition for DX is what brought FastAPI this far anyway).
One insightful comment by the way:
in my humble opinion cli is not the thing that is needed for web server.
(back to #3659 and #143)
I actually prefer option 1 of @noklam's proposals. We already have the concept of pip install ...[...]
in the Kedro ecosystem with kedro-datasets
so I feel it's more consistent with other things we have done.
I can see why people like option 2, but with that one we need to think more about whether we would then break up Kedro in a modular way, where you can "add-on" parts by doing pip install kedro-...
or if it's similar to fastapi where the packages are duplicates. I'm also still concerned if this is worth the engineering effort as well as the future maintenance.
What I'd like to know: are Kedro's core dependencies a reason for people not to adopt Kedro?
We will be closing this issue for now, based on this discussion on Kedro's dependencies. We have decided on a longer term approach to deal with Cookiecutter, and by extension Rich, that will take some time to design and implement. This issue will be addressed once we're ongoing with this process.
Description
cookiecutter
is a mandatory dependency ofkedro
, because it's what powerskedro new
andkedro pipeline create
.However, it's a heavy dependency. From #3659:
Currently we are forcing Kedro users to install dependencies that are needed for development (
kedro new
,kedro pipeline create
) also in their production environments.One of those problematic transitive dependencies is
rich
. As of #3838,kedro
can work even ifrich
is not installed, but the current method to do that is very brittle and difficult for users. As a result, we are effectively blocked from making progress with #2928.For backwards compatibility reasons,
pip install "kedro==0.19.*"
will need to keep carryingcookiecutter
as a dependency. The task then is to come up with a roadmap in whichpip install "kedro>=0.20"
does not.