Closed lorenabalan closed 4 years ago
Working on this
I would love to get rid of some of the verbosity from my repos, and not hand name each and every node, and only need to do so in the edge cases in which odd things occur.
I was looking into this issue. @lorenabalan
The current implementation in kerdo-viz
shows the node.name
as the node (display text) and on hover displays the node.func_name
.
One idea is to change the default behaviour so that instead it returns the underlying function name and a small hash, which is based on the inputs and outputs. This would now be perceived as a unique identifier.
Do you propose setting this unique identifier as the node.name
?
Extra: Check that hover state on nodes in Kedro-Viz shows you the correct thing to be put into the command line
kedro run --node
.
To implement this behavior, we will either have to change the kedro-viz
hover option or change the node.func_name
.
The current naming convention used contains a lot of useful information and can be store as node.info
if required. A lot of tests need to be rewritten to accommodate these changes.
Hey @nishnash54 , apologies for the late reply. Kedro-Viz currently makes use of Node.short_name
and Node._func_name
. We've been struggling with this idea for a while now and it feels like our API is a bit all over the place in terms of what is a name and what is an identifier, and how we use them in Node
and Pipeline
(unique_key
, validate_node_duplicates
, etc.), and we need to find the time to sit down and think about it as a whole. In the past we discussed that name/ID (not sure yet if they should be the same thing or different properties) should ideally marry 3 principles: unique, human readable, reasonably straightforward to reconstruct/deduce in your head, which makes this very hard.
I don't think Node.func_name
should change. There is also work happening on the Viz side of things, to display more information about the nodes and datasets, so we'll use that to feed into our decisions too.
We've parked this for now to focus on other deliverables on our roadmap.
Description & context
Users can specify names for their nodes to identify them more easily. When a name is not explicitly specified, Kedro auto-generates a default name. You can see this in the
name
property onNode
. The current auto-generated name for a node looks something like this:func_name(inputs) -> outputs
. (see implementation of__str__
method on theNode
class)This is a bit too descriptive and quite hard to type in the CLI to run a particular node (
kedro run --node <node_name>
). Our visualisation plugin, Kedro-Viz, is also no longer displaying this long name in the UI.We can change it to something like '-'.join(sorted(outputs)). Outputs should be unique, which makes these names unique too.
Actually just one output is enough, as they should all be unique.
If the node produces no outputs, we probably need to fall back to what we had before or similar.
Possible Implementation
One idea is to change the default behaviour so that instead it returns the underlying function name and a small hash, which is based on the inputs and outputs. This would now be perceived as a unique identifier.
Extra: Check that hover state on nodes in Kedro-Viz shows you the correct thing to be put into the command line
kedro run --node
.