ricklupton / floweaver

View flow data as Sankey diagrams
https://floweaver.readthedocs.io
MIT License
449 stars 89 forks source link

Sorting nodes to minimise intersections #76

Open clbarnes opened 5 years ago

clbarnes commented 5 years ago

It would be nice if there were a way to automatically sort the classes at each stage in order to minimise the number of times the flows crossed over each other.

I'm pretty sure this is a hard problem, but one which would be pretty valuable to sankey diagram usage!

ricklupton commented 5 years ago

Definitely! With floWeaver I wanted to first prioritise being able to control details when needed over making things automatically work -- but there's definitely scope to add some tools building on top of the low-level "Sankey diagram definition" to make it easier to get started and get the diagram looking good.

d3-sankey-diagram includes this kind of automatic layout. You might like to have a go using that if you want automatic sorting now before floweaver supports it.

clbarnes commented 5 years ago

Here's an implementation, for anyone who comes across this in future: https://gist.github.com/clbarnes/2a2b448f878ad4b2c78b7a8b01c9cae0

Given a df: pandas.DataFrame with the information from https://github.com/ricklupton/floweaver/blob/master/docs/demo_table.png , you'd do

from collections import frozenset
columns = [df["source"], df["target"]]
weights = {frozenset(src_tgt): weight for src_tgt, weight in zip(zip(*columns), df["value"])}
column_ordering = MultiSankeySorter(columns, weights, 0).sort()

This can be done with any number of columns: you just pick which one has a fixed order and it works outwards pairwise from there.

ricklupton commented 5 years ago

Amazing! Interesting paper too. Thanks, look forward to giving it a go.

ricklupton commented 5 years ago

Hi @clbarnes, sorry for the delay in looking at this. I've just been playing around with it and it looks like it works well with the simple example. I had to change your sample code a tiny bit: https://nbviewer.jupyter.org/gist/ricklupton/59cb27b841d2e9beca1a5476539dd054

To work within floWeaver, one way you might want to use this would be to keep the overall ordering fixed -- because there's some kind of logic to having things in a certain order -- but sort the items within the partitions (within ProcessGroups and Waypoints) to minimise crossings. Do you have any idea whether this algorithm could be extended to minimise crossings by sorting the nodes within each group, without changing the order of groups?

e.g. if abc were in one group, and de were in the other, it would be ok to change from

[ a b c ] [ d e ]

to

[ c a b ] [ e d ]

but not

[ a b !!! e d c
clbarnes commented 5 years ago

I'm not sure - this algorithm doesn't think about items within the classes. It just decomposes the problem into a bipartite graph whose nodes are classes and whose edges are weighted by the number of items shared between any two classes.

I could be wrong, but with classes ordered the same, wouldn't you just need to order the items inside each right-hand class according it its left-hand class? When ordering items within a right-hand class, you can't change the number of overlaps with flows which target a right-hand class other than the one you're currently operating on, so all you can do is make sure that the items coming from the bottom left-hand class aren't at the top of their right-hand class.

My gist effectively does this by ignoring the internal items and just treating their counts as proportions, and making sure flows entering the same class don't overlap with each other.