Open even-of-the-hour opened 2 years ago
I would like to thank @even-of-the-hour for sharing this. This is a great solution for sorting the nodes exactly how you want them. I have added to this code to accommodate 3 levels and wanted to share in case it helps out anyone else.
# Creating dummy data; 3 levels, each with 4 nodes.
d1 <-
c(0:3) %>% rep(4) %>% rep(4) %>% sort
d2 <-
c(4:7) %>% rep(4) %>% sort %>% rep(4)
d3 <-
c(8:11) %>% sort %>% rep(4) %>% rep(4)
varying_flows <- rpois(64,0.25)
my_labels <- paste0("Node ", 1:12)
my_varying_flows <- data.frame(d1, d2, d3, varying_flows)
# Convert the data to the format required by sankey function
for(i in 1:2){
group1 <- c("d1","d2")[i]
group2 <- c("d2","d3")[i]
my_varying_flows_thick <- my_varying_flows %>% group_by(!!as.name(group1),!!as.name(group2)) %>% summarise(sum(varying_flows))
colnames(my_varying_flows_thick) <- c("source", "target", "thickness")
source.label.pos <- my_varying_flows_thick %>%
group_by(source) %>%
summarize(n = sum(thickness)) %>%
mutate(source.label.pos = 1 - (cumsum(n) - n/2) / (sum(n)))
target.label.pos <- my_varying_flows_thick %>%
group_by(target) %>%
summarize(n = sum(thickness)) %>%
mutate(target.label.pos = 1 - (cumsum(n) - n/2) / (sum(n)))
my_varying_flows_thick$source.label.pos <- source.label.pos$source.label.pos[match(my_varying_flows_thick$source, source.label.pos$source)]
my_varying_flows_thick$target.label.pos <- target.label.pos$target.label.pos[match(my_varying_flows_thick$target, target.label.pos$target)]
if(i ==1){
my_varying_flows_data <- my_varying_flows_thick; next
}
my_varying_flows_data <- rbind(my_varying_flows_data,my_varying_flows_thick)
}
# Calculate x,y position
node_x <- sort(rep(c(0:2),4))/2 + c(rep(0.001, 4), rep(0,8))
node_y <- my_varying_flows_data[,c("source","source.label.pos")] %>% group_by() %>% unique %>%
select(source.label.pos) %>%
unlist %>% as.numeric
node_y <- c(node_y,my_varying_flows_data[,c("target","target.label.pos")] %>% group_by() %>%
filter(!target %in% my_varying_flows_data$source) %>% unique %>%
select(target.label.pos) %>%
unlist %>% as.numeric)
node_y <- node_y * -1 + max(node_y)
node_y <- node_y %>% round(3)
node_y[node_y == 0] <- 0.001
node_y
# Plot
fig4 <- plot_ly(
type = "sankey",
arrangement = "snap",
node = list(
label = my_labels,
# Avoiding 0 values seemed to help
x = node_x,
# Not clear to me why these didn't work and we instead need their
# complements (e.g., 1 - original value) for correct placement, as if the
# node.y positions were the distance from the top, not the bottom?
y = node_y
),
link = list(
source = my_varying_flows_data$source,
target = my_varying_flows_data$target,
value = my_varying_flows_data$thickness
)
)
fig4 <- fig4 %>%
layout(
title = list(
text = "fig4 - varying flows in intended order with odd workaround;3 levels"
)
)
# Nodes appear in intended order.
fig4
Main problem: Nodes appear in order of data frame under some conditions (such as symmetric flows) but under unknown conditions (some asymmetric flows, but not all), they appear out of order according to other, unknown rules. Manual positioning using node.x and node.y also has unclear rules. I'm trying to work around the lack of a sorting feature but hitting snags all over the place.
Forgive me, I'm rather new to plotly and don't understand how plotly.R interacts with python or js plotly. In trying to solve this problem, I see Issue #4373 for plotly.js describes lack of a sort feature and Issue #3002 for plotly.py states that node.x and node.y cannot be 0.
My use case is that I want to produce a large set of sankey graphs for flows between 5 specific nodes at Time1 and 5 specific nodes at Time2. For this reason, I would like my nodes to be drawn in the same order every time, no matter the size of the nodes or flows. I wrote script to dynamically find the correct node.y positions for nodes based on their order and size. Even this workaround is running into problems as noted in the code below.
Minimally, I guess I'm looking for more detailed documentation about node.x and node.y compared to what is currently in the reference page.
More broadly, why is the data frame order of the nodes being overridden, such as in the uneven_flows example below?
Created on 2022-01-29 by the reprex package (v2.0.1)