Open DimedS opened 23 hours ago
Adding a bit of context - deep integration with Kedro-Viz was the first attempt to drive adoption and improve explainability:
We have spent nearly 5 years trying to explain this to users in various ways - We must pivot strategy.
Thanks @DimedS for opening this issue.
First, I would like to agree that tags do not guarantee non-overlapping pipeline partitioning. This has been said time and time again.
But I am going to push back against the idea that namespaces are the right solution for that problem. The main reason is that they were probably never designed to solve it in the first place!
Namespaces were born as "prefixes" and were introduced in Kedro 0.15.4 in October, 2019:
3c0f097991119cce5f42de8844686de104604bf4
(https://github.com/McK-Private/private-kedro/pull/286, private link)
And then in 0.16.0 the modern concept of "modular pipelines" with namespace
was introduced in March 2020:
af046ca6c738a89e19d6e31ab432a13b0b184190
Therefore a bit less or a bit more than 5 years have passed, depending on how you look at it.
The original context and discussion have forever been lost in time https://jira.quantumblack.com/browse/KED-1105 (broken internal link) but we can get a glimpse of what the intent of the feature was from this comment:
Nikos pointed me to this and having thought a bunch about vertical pipeline development and pipeline re-use, recently
(https://github.com/McK-Private/private-kedro/pull/286#issuecomment-542717548, private link)
In addition, this is how the documentation of prefixes, and later namespaces, looked like:
The docs have always described namespaces (prefixes) as a way to reuse pipelines. There were zero review comments in those two PRs raising concerns about that.
To note, nobody from the current team participated in the original 0.15 discussion.
Therefore, I can only conclude that namespaces were always designed for pipeline reuse in mind.
Implying that namespaces have always been the solution for pipeline non-overlapping partitioning is, in my view, a big unqualified opinion that has no backing in historical written evidence. And as such, saying that "the docs are wrong" is a misrepresentation of what those docs were supposed to describe.
If anything, we're now retrofitting namespaces to solve a problem they weren't intended to solve in the first place.
I am going to push back against doing incremental improvements on a feature that nobody has dared to touch in 5 years, that's difficult to understand even for Kedro engineers, let alone for our users (regardless of their intended use case), and that we're probably retrofitting to solve a problem they weren't designed for.
My recommendation is that we look at the problem of non-overlapping pipeline partitioning with fresh eyes, go back to the drawing board, and prototype.
I would also say from users
Thank you for your comments, @datajoely and @astrojuanlu. I see that there isn’t a consensus within the team about the future of namespaces, so I’ve updated the header of this issue to reflect your perspectives.
I propose that we continue the discussion about deployment node grouping in the next Tech Design meeting with an open mind to all grouping possibilities - not limited to namespaces. If, during that discussion, we determine that namespaces are essential for deployment, we can revisit this conversation and make a decision on their future.
Great - I'll also link to this write up from last year: https://github.com/kedro-org/kedro/wiki/Synthesis-of-research-related-to-deployment-of-Kedro-to-modern-MLOps-platforms
Kedro namespaces are currently not widely used. The team is divided on the reasons for this:
This parent issue aims to facilitate an agreed-upon decision regarding the points above and address these concerns. It is also tied to the goal of improving deployment functionality, where namespaces should play a pivotal role in node grouping.
History
Improving docs
[ ] The current documentation focuses primarily on how namespaces enhance pipeline reusability (see docs). However, this ticket proposes updating the docs to include a clear definition of namespaces, highlighting that they are similar to node tagging but do not allow overlaps. This makes namespaces an excellent choice for creating groups of nodes that can be executed together without conflicts. Suggested docs example: -Create pipelines without namespaces: Show how to build basic pipelines.
-Create namespaced pipelines: Use the initial pipelines to create namespaced versions.
-Combine pipelines: Build a final pipeline by combining the namespaced ones.
-Visualise: Include a visualisation using Kedro-Viz (link to ticket in progress).
[ ] #4016
[ ] Clarifying Modularity. The term "modularity" currently appears to relate to creating pipelines in separate folders, not namespaces. If this interpretation is correct, we should explicitly clarify this distinction in the docs.
Technical issues
Several technical issues were highlighted by @idanov during the last TD. These will be moved here for tracking (details in progress).
User interface
There is a potential user interface concern affecting namespace adoption, which might benefit from design attention (@stephkaiser, @iamelijahko).
Tagging Example: Tags are added directly during node or pipeline creation:
Alternatively, for pipelines:
Namespace Example: Namespaces are applied at the pipeline creation level and involve multiple steps:
This prefixes all inputs, outputs, and parameters with
part1.
, which most likely not to be desired. To preserve naming:Tags are applied directly to nodes, whereas namespaces require changes at the pipeline level. Simplifying the namespace UI or aligning it more closely with tagging might also improve adoption.
Few other UI gaps reported by users:
Namespaces in deployment
We aim to unify and implement node grouping functionality for deployment purposes in #4319. Namespaces appear to be a great fit for this purpose. However, the ongoing work to increase namespace adoption from the current ticket must be completed on the same time.