thomasp85 / ggraph

Grammar of Graph Graphics
https://ggraph.data-imaginist.com
Other
1.08k stars 116 forks source link

circlepack layout struggles with large number of nodes #345

Closed jhjlee closed 10 months ago

jhjlee commented 1 year ago

Hi Thomas,

Thanks for a great tool! I have a question about the circlepack layout that I'm trying to use on my data that has a single level of hierarchy (hundreds of groups, ~15 groups containing 100 or more nodes). When trying to plot, it runs for a while, and eventually throws a "enc3 error" , which I believe is related to the size of the enclosing circle? Is there an upper limit to the number of nodes that can be included in a circlepack plot? Thank you very much.

schochastics commented 1 year ago

Can you post a reproducible example? That would make it easier to check what is going on

dn-ra commented 1 year ago

Hi David,

I'm working with @jhjlee on this data.

Here is a reproducible example that approximates the data we have.

Unfortunately it isn't consistently reproducible. I can run it maybe 5 times, and twice it will execute properly, once it will produce an empty plot, and twice it will result in the problem Hyun Jae describes above. It will hang for a very long time and then produce an enc3 error, or a nonsensical plot. I haven't been able to get it to spit out the enc3 error with this toy data yet, but the hanging will happen after a few tries.

I note that it's normal for it to take some time to actually print out the plot, but creating the plot with ggraph(graph, layout = 'circlepack', weight = size) + geom_node_circle(aes(fill = label, color = factor(depth))) is where the issue comes from.

It seems there is some randomness to the results as it figures out the plot co-ordinates. Does this have something to do with the inefficiencies of the layout algorithms as per #234 ? And are there any solutions you can suggest?

Thanks, Dan

#n of groupings is 50
#n of items is 3204
#n of labels is 10

library(igraph)
library(ggraph)
library(dplyr)
library(magrittr)
library(uuid)
library(RColorBrewer)

items <- uuid::UUIDgenerate(n = 3204) #identifiers of invidual items
labels <- c('boston','brisbane','london','paris','newyork','tokyo','moscow','bogota','beijing','johannesburg') #categories for colouring individual items
groupings <- replicate(paste(sample(LETTERS, 4, TRUE), collapse = ''), n = 50) #grouping variable to pack items in

data <- data.frame(items = items, label = sample(labels, 3204, replace = T), group = sample(groupings, 3204, replace = T)) %>% group_by(group)

#root edges
root_edges =  data.frame('a' = 'root', 'b' = group_keys(data)$group)

#item edges
item_edges = mapply(FUN = function(x, y) {
  cbind('a' = x, 'b' = y$items) } ,
  group_keys(data)$group, group_split(data), USE.NAMES = F
) %>% with(do.call(rbind,.)) %>% as.data.frame()

#this object contains all edges for circlepack plot. items within groupings within 'root'
edge_list = as.matrix(rbind(root_edges, item_edges))

#need to add entries in the original data to accommodate `grouping` and `root` as nodes, so get list of missing node names. This is just for code functionality
missing_node_names = data.frame('items' = do.call(c, root_edges) %>% unique())

#this object contains all the metadata required for the graph
vertex_meta <- data %>% mutate(size = 1) %>% full_join(missing_node_names)  %>% 
  mutate(label = replace(as.character(label), is.na(label), 'A-None'), group = replace(as.character(group), is.na(group), 'A-None')) %>% 
  mutate(size = replace(size, is.na(size), 0),
         label = factor(label, exclude = NULL) )

#init graph
graph = igraph::graph_from_data_frame(edge_list, vertices = vertex_meta)

#colors of lables
label_colors = RColorBrewer::brewer.pal(n = 10, name = 'BrBG')
#colors need to be a named vector for plotting to work
names(label_colors) = levels(as.factor(data$label))
#Add background color of white for all `A-None` data points
colors_for_labels = c("A-None" = 'white', label_colors)

#now create plot
graph_by_label = ggraph(graph, layout = 'circlepack', weight = size) + 
  geom_node_circle(aes(fill = label, color = factor(depth))) +
  theme_void() + coord_fixed() +
  scale_fill_manual(values=colors_for_labels, breaks = names(label_colors)) +
  scale_color_manual(values = c("0" = "white", "1" = "black", "2" ="black"), breaks = c()) + guides(fill = guide_legend('Labels'))

#and show
graph_by_label
thomasp85 commented 10 months ago

Thanks - I can reproduce the behaviour based on the provided code. Will investigate

dn-ra commented 10 months ago

Hi Thomas, thanks for fixing. Out of interest could you let me know what caused it?

Thanks, Dan

thomasp85 commented 10 months ago

it was due to rounding errors that lead to catastrophic failures as it is so often with computational geometry. It was fixed in the D3 version some time ago but I hadn't kept up to date with it