Node ordering nightmare when doing cluster diagrams

HariSekhon commented 1 year ago

It turns out diagrams jumbled node ordering weirdness becomes a nightmare when doing diagrams of clusters with more than a couple nodes.

When there are only a couple nodes, you can just declare node 2 first and then node 1, but the ordering becomes much more complicated once you have a hand or two of nodes, such as for distributed computing clusters like most Big Data or NoSQL clusters.

I'd been working around the node order with some ugly hacks like this:

        opentsdb = {}
        opentsdb_range = range(1, 16, 1)
        # crude instead of algo positioning but quick
        ordering = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1 , 2]
        ordered_opentsdb_range = [opentsdb_range[ordering.index(i)] for i in opentsdb_range]
        for _ in ordered_opentsdb_range:
            opentsdb[_] = Custom(f"OpenTSDB pod {_}", opentsdb_icon)
            opentsdb_service >> opentsdb[_]

and

        hbase = {}
        hdfs = {}
        node_range = range(1, 13, 1)
        # crude instead of algo positioning but quick
        ordering = [6, 9, 3, 12, 2, 10, 5, 7, 8, 4, 11, 1]
        ordered_node_range = [node_range[ordering.index(i)] for i in node_range]
        for _ in ordered_node_range:
            with Cluster(f"Hadoop node {_}"):
                    hbase[_] = HBase("HBase")
                    hdfs[_] = Hadoop("Hadoop HDFS")
                    hbase[_] >> hdfs[_]
                    for i in opentsdb_range:
                        hbase[_] << opentsdb[i]

which can be seen in their entirety in this code:

https://github.com/HariSekhon/Diagrams-as-Code/blob/master/opentsdb_kubernetes_hbase.py

Notice that the ordering hacks are different in both cases due to the number of different nodes, 12 vs 15 in this case - so this isn't even universal, and I haven't take the time to do a generic algo to figure out how to straighten out the ordering.

I've just been using D2 and recreated this diagram using that as this problem doesn't exist there, which made me realize that it should be fixed in Python diagrams.

So I'm raising this ticket to fix the node ordering to behave linearly in the same fashion as D2 as a client programmer user would expect it to without having to hack the order in which nodes are declared in code.

HariSekhon commented 1 year ago

Notice also the messed up cluster positions in the diagram being adjacent instead of direction='TB' as directed in the code:

opentsdb_kubernetes_hbase

which is an open issue here:

https://github.com/mingrammer/diagrams/issues/44#issuecomment-1532072561

clayms commented 1 year ago

changing this line:

hbase[_] << opentsdb[i]

to this:

opentsdb[i] >> hbase[_]

gives:

HariSekhon commented 1 year ago

Nice, thanks, I just realized the same thing with D2 an hour ago that the direction of arrows changes the inference of how the layout engines think to put the diagrams together. I expect this works in conjuction with the diagram direction...

Any ideas about the node ordering?

clayms commented 1 year ago

The problem is you are trying to connect 15 nodes with 12 nodes. So you'll have to change the order of the 15 nodes to start at 12.

Replace :

opentsdb = {}
opentsdb_range = range(1, 16, 1)
### crude instead of algo positioning but quick
ordering = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1 , 2]
ordered_opentsdb_range = [opentsdb_range[ordering.index(i)] for i in opentsdb_range]
for _ in ordering:
    opentsdb[_] = Custom(f"OpenTSDB pod {_}", "./img/opentsdb.png")

opentsdb_service >> opentsdb[_]

with:

tsdb_ordering = [12, 13, 14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ]
tsdb_nodes = [Custom(f"OpenTSDB pod {_}", "./img/opentsdb.png") for _ in tsdb_ordering]
opentsdb_service >> tsdb_nodes

and replace :

node_range = range(1, 13, 1)
# crude instead of algo positioning but quick
ordering = [6, 9, 3, 12, 2, 10, 5, 7, 8, 4, 11, 1]
ordered_node_range = [node_range[ordering.index(i)] for i in node_range]
for _ in ordered_node_range:
    with Cluster(f"Hadoop node {_}"):
            hbase[_] = HBase("HBase")
            hdfs[_] = Hadoop("Hadoop HDFS")
            hbase[_] >> hdfs[_]
            for i in opentsdb_range:
                hbase[_] << opentsdb[i]

with:

node_ordering = [1,2,3,4,5,6,7,8,9,10,11,12][::-1]
for _ in node_ordering:
    with Cluster(f"Hadoop node {_}"):
        hbase[_] = HBase("HBase")
        hdfs[_] = Hadoop("Hadoop HDFS")
        hbase[_] >> hdfs[_]
    tsdb_nodes >> hbase[_]

clayms commented 1 year ago

Alt solution, which I think looks better and avoids any ordering issues:

from diagrams import Diagram, Cluster, Edge, Node

#...
#...

graph_attr = {
    "splines": "spline",
    "concentrate":"true",
}

#...
#...

    tsdb_ordering = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ]
    tsdb_nodes = [Custom(f"OpenTSDB pod {_}", "./img/opentsdb.png") for _ in tsdb_ordering]
    opentsdb_service >> tsdb_nodes

blank1 = Node("", shape="plaintext", height="0.0", width="0.0")

with Cluster("Hadoop cluster on-prem") as hadoop:
    hbase = {}
    hdfs = {}    
    node_ordering = [1,2,3,4,5,6,7,8,9,10,11,12][::-1]
    for _ in node_ordering:
        with Cluster(f"Hadoop node {_}"):
            hbase[_] = HBase("HBase")
            hdfs[_] = Hadoop("Hadoop HDFS")
            hbase[_] >> hdfs[_]

    tsdb_nodes - \
        Edge( tailport="s", headport="n", minlen="2") - \
        blank1 >> \
        list(hbase.values())

HariSekhon commented 1 year ago

The nodes reversed order is one thing that could probably be solved in core diagrams code instead of client programmers having to do this:

tsdb_ordering = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ]

or

range(12, 0, -1)

for every diagram one writes which has more than a couple of nodes in it.

I don't understand this bit:

Edge( tailport="s", headport="n", minlen="2")

I went back to look at the docs but can't find it explained there either:

https://diagrams.mingrammer.com/docs/guides/edge

That blank1 trick seems quite novel though - to get all the arrows through a central point to avoid messy criss-crossing?

clayms commented 1 year ago

This library is just a wrapper around graphviz. The Graphviz documentation has a lot more detail for Nodes, Clusters, and Edges, and much more.

https://graphviz.org/docs/edges/ https://graphviz.org/docs/attr-types/portPos/ https://graphviz.org/docs/attrs/headport/ https://graphviz.org/docs/attrs/tailport/ https://graphviz.org/docs/attrs/minlen/

HariSekhon commented 1 year ago

Thanks for the links.

Yes I know Python diagrams is a wrapper around Graphviz as I have had a poke at the intermediate generated dot file before -although 50 lines of Python generating 1800 lines of dot made me think twice about that!

My main question are:

whether the python diagrams library can account for the node ordering weirdness in the python such that it generates the dot to give better node ordering?
whether there is much we can do in terms of placement control at the python diagrams level? I will certainly dig deeper into the Graphviz documentation to see if there is something I can put through a dictionary passed through python diagrams.

mingrammer / diagrams

Node ordering nightmare when doing cluster diagrams #891