kayjan / bigtree

Tree Implementation and Methods for Python, integrated with list, dictionary, pandas and polars DataFrame.
https://bigtree.readthedocs.io
MIT License
168 stars 13 forks source link

Add structural awareness of moved nodes in `get_tree_diff` #309

Open acxz opened 5 days ago

acxz commented 5 days ago

Is your feature request related to a problem? Please describe.

I'd like to be able to see not only what the newly added and the removed nodes are in the tree, but I'd like the builtin diff method (get_tree_diff) to also let me know if a certain node has moved to a different location in the tree, rather than showing it as a removed node and an added node.

Describe the solution you'd like

In addition to showing the annotated (+) and (-) also add (moved from) and (moved to). Maybe with an option to list out the full or relative path moved from/to, i.e. (moved from: a/b/c) and (moved to: a/b/d).

Describe alternatives you've considered

nutree has a nice diff output shown here.

image

image

image

They even have a really nice way of visualizing the diff here

image

Additional context

I love this repo by the way, easy to use and very clear! You've done a great job here.

Final Check

kayjan commented 4 days ago

Thanks for your support! I have not heard of nutree before, will check that repo out and also this feature! Just to check - in terms of indicating and does it reconcile this 'movement' based on the nodes having the same name?

kayjan commented 3 days ago

This is now implemented in bigtree v0.22.1, do perform pip install --upgrade bigtree.

The usage is as such:

from bigtree import get_tree_diff, list_to_tree

root = list_to_tree([
    "Downloads/Pictures/photo1",
    "Downloads/Pictures/photo2",
    "Downloads/file1",
    "Downloads/photo3",
])
root.show()
# Downloads
# ├── Pictures
# │   ├── photo1
# │   └── photo2
# ├── file1
# └── photo3

root_other = list_to_tree([
    "Downloads/Pictures/photo1",
    "Downloads/Pictures/photo3",
    "Downloads/file1",
    "Downloads/file2",
])
root_other.show()
# Downloads
# ├── Pictures
# │   ├── photo1
# │   └── photo3
# ├── file1
# └── file2

tree_diff = get_tree_diff(root, root_other, detail=True)
tree_diff.show()

This will result in a tree

Downloads
├── Pictures
│   ├── photo2 (removed)
│   └── photo3 (moved to)
├── file2 (added)
└── photo3 (moved from)

This is also reflected in the updated docs. In terms of visualisation, there is nothing changed, you can already indicate different node/edge appearance for different nodes. This way there is more flexibility for your graph appearance. For example,

from bigtree import Node, tree_to_dot

def get_node_attribute(node: Node):
    if node.node_name.endswith(("(moved from)", "(removed)")):
        node.name = str(node.node_name.rstrip(" (moved from)").rstrip(" (removed)"))
        return {"style": "filled", "fillcolor": "red"}
    elif node.node_name.endswith(("(moved to)", "(added)")):
        node.name = node.node_name.rstrip(" (moved to)").rstrip(" (added)")
        return {"style": "filled", "fillcolor": "green"}
    return {}

graph = tree_to_dot(tree_diff, node_attr=get_node_attribute)
graph.write_png("tree_diff_detail.png")

tree_diff_detail

acxz commented 16 hours ago

@kayjan Wow! That was quick! Thanks for the adding that feature in!

I just tested it out on the original example and I get the following output:

Tree Diff
Root
├── A
│   ├── a1
│   │   ├── a11 (removed)
│   │   └── a12
│   └── a2
│       └── a21 (added)
├── B
│   └── b1 (moved from)
│       └── b11 (moved from)
└── C (added)
    └── b1 (moved to)
        └── b11 (moved to)

image

I used the following get_node_attribute function to better visualize the various kinds of differences (add, remove, moved to/from): Note I color the edges green/red as well to show new/old parents for the various nodes.

def get_node_attribute(node: bigtree.Node):
    if node.node_name.endswith("(removed)"):
        return {"style": "solid", "color": "red"}
    elif node.node_name.endswith("(added)"):
        return {"style": "solid", "color": "green"}
    return {}

def get_edge_attribute(node: bigtree.Node):
    if node.node_name.endswith(("(removed)", "(moved from)")):
        node.name = str(node.node_name.rstrip(" (moved from)").rstrip(" (removed)"))
        return {"style": "filled", "color": "red"}
    elif node.node_name.endswith(("(added)", "(moved to)")):
        node.name = node.node_name.rstrip(" (moved to)").rstrip(" (added)")
        return {"style": "filled", "color": "green"}
    return {}

I don't think node b11 should be shown as being moved, since even tho the full path has changed for it, it's path relative to the parent (or superparent) which has moved, has not changed. This causes some redunduncies and clutters the diff output, as with moved to/moved from we only want to show parents being changed (i.e. the edges being colored appropriately). b11 remains intact. I can imagine for diffs with larger trees, a majority of the tree could be flagged green/red and duplicated in the output making the diff hard to read.