plotly / plotly.py

The interactive graphing library for Python :sparkles: This project now includes Plotly Express!
https://plotly.com/python/
MIT License
16.26k stars 2.55k forks source link

Sankey node x and y positions ignored if node is empty #3003

Open banderlog opened 3 years ago

banderlog commented 3 years ago

Consider the next example

import numpy as np
import plotly.graph_objects as go

def plot_sankey(source, target, value, labels):
    "From <https://plotly.com/python/sankey-diagram/>"
    fig = go.Figure(data=[go.Sankey(
        arrangement = "perpendicular",
        node = dict(
          pad = 15,
          thickness = 20,
          line = dict(color = "black", width = 0.5),
          label = labels,
          color = "red",
        ),
        link = dict(
          source = source, # indices correspond to labels, eg A1, A2, A1, B1, ...
          target = target,
          value = value
        ))])

    return fig

source = np.expand_dims(range(0, 25), axis=1).repeat(5, axis=1).flatten()
target = np.array(range(5, 30)).reshape(5, -1).repeat(5, axis=0).flatten()

value_not_so_sparse = [0, 7, 1, 2, 0, 5, 241, 5, 255, 6, 0, 11, 2, 8, 0, 10, 221, 9, 203, 2, 0, 7, 0, 5, 0, 0, 9, 0, 6, 0, 6, 251, 9, 217, 4, 0, 7, 1, 9, 0, 6, 220, 7, 239, 1, 0, 5, 1, 2, 0, 0, 6, 0, 6, 0, 2, 231, 14, 240, 5, 0, 9, 0, 9, 0, 6, 214, 9, 242, 2, 0, 3, 0, 2, 0, 0, 1, 1, 4, 2, 5, 241, 5, 211, 1, 1, 12, 0, 9, 1, 3, 230, 19, 243, 4, 0, 2, 0, 5, 0, 0, 4, 0, 5, 0, 8, 217, 13, 245, 3, 0, 10, 0, 15, 0, 8, 222, 8, 231, 3, 0, 4, 0, 4, 0]

value_sparse = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 274, 460, 266, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 142, 129, 3, 0, 0, 0, 103, 242, 115, 0, 0, 0, 18, 126, 122, 0, 0, 0, 0, 0, 140, 2, 0, 0, 0, 106, 116, 10, 0, 0, 0, 64, 124, 75, 0, 0, 0, 6, 118, 117, 0, 0, 0, 1, 121, 242, 4, 0, 0, 0, 96, 82, 4, 0, 0, 0, 38, 65, 37, 0, 0, 0, 9, 87, 98, 0, 0, 0, 4, 234, 333, 5, 0, 0, 0, 58, 62, 4, 0, 0, 0, 15, 36, 27, 0, 0, 0, 3, 75, 50, 0, 0, 0, 13, 319]

labels = []
# because labels could not repeat at different time points
for i in range(6):
    labels.extend([f'{j}_{i}' for j in [1, 2, 3, 4, 5]])

# True
len(source) == len(target) == len(value_not_so_sparse) == len(value_sparse)

Plot sankey where all nodes are present:

fig = plot_sankey(source, target, value_not_so_sparse, labels)
fig.show()

image

Now try to rearrange nodes:

fig.data[0].node.x = np.linspace(1e-09, 1, 6).repeat(5)
fig.data[0].node.y = np.expand_dims(np.linspace(1e-09, 1, 5), axis=1).repeat(6, axis=1).T.flatten()
fig.show()

image

Everything works. Not without crotches (https://github.com/plotly/plotly.py/issues/3002, https://github.com/plotly/plotly.py/issues/1732), but works.

Now try plot where some nodes are missing, but we still want to arrange them:

fig = plot_sankey(source, target, value_sparse, labels)
fig.show()

image

Now arrange should work still because we have the same target, source, labels ans len(values):

fig.data[0].node.x = np.linspace(1e-09, 1, 6).repeat(5)
fig.data[0].node.y = np.expand_dims(np.linspace(1e-09, 1, 5), axis=1).repeat(6, axis=1).T.flatten()
fig.show()

image

It counts only non-empty nodes

Not sure if it is not plotly.js problem

1kastner commented 3 years ago

I also stumbled over this strange behavior. It could be great if I do not need to check manually whether a node is empty before plotting because that adds quite some complexity to the code which should be generic. I would like to create some visuals that update with different data (think of it like a dashboard). Somehow I feel that this feature that we do not "need" to position empty nodes just creates more problems than it solves.

UFO-101 commented 1 year ago

Bump. This is still broken.