Open clpippel opened 1 year ago
Probably a consequence of R's vector recycling behaviour?
The graph attribute named layout
is special in the sense that you can assign a matrix there and igraph's layout_nicely()
layout algorithm (which is the default) uses that matrix to decide where the vertices should be. We should adjust the plot.igraph
method to make sure that only the first few rows of the matrix are used if the matrix has more rows than the number of vertices in the graph.
@krlmlr Feel free to re-assign it to one of your teammates if needed.
The problem is that the layout matrix of the subgraph is inherited from the original graph. To avoid this, the graph$layout attribute should not be copied to the subgraph. Or else the coordinates of the missing vertices should be removed (nicer solution).
graph$layout
can contain different types of values. Some of them are safe to copy (layout algorithm names) and some aren't (coordinate matrix). IMO storing the coordinates as a graph attribute (and not as vertex attributes) was a mistake. This is what I tried to describe in the wiki page that summarizes potential improvements to visualization in igraph 2.
@krlmlr @vtraag These sorts of issues are why I think that visualization can't (or shouldn't) be handed off wholesale to another package in igraph 2. Part of it will always be igraph's business. The storage of coordinates or visualization methods, and the expected behaviour with various stored visualization information is just one example.
What are action points here then according to you @szhorvat? A big overhaul of how the coordinates are stored?
A big overhaul of how the coordinates are stored?
That's the only thing I think we can do ... will there ever be enough resources for this? I don't know.
See the old igraph 2 planning document: https://github.com/igraph/rigraph/wiki/igraph-2-planning Ambitions got deflated quite a bit since then, as it became here how much all of this would take.
Should this be closed as non planned then, as the idea is tracked in that document?
That document does not replace the issue tracker, and is in need of an update. It's up to Kirill to judge how big a project we can take on. There were several ideas floating around, including offloading visualization to other packages entirely (like ggraph). Personally I don't favour that, but I admit there are good reasons to go that route.
I wouldn't close this issue because it describes a very real problem that occurs frequently when working with igraph.
Layouting belongs in igraph, not so sure about plotting.
Reprex:
options(conflicts.policy = list(warn = FALSE))
library(igraph)
g <- make_ring(10)
g$layout <- layout_in_circle(g)
sg <- subgraph(g, c(1, 2))
gs <- make_star(2)
gs$layout <- layout_in_circle(gs)
waldo::compare(sg$layout[, 1], gs$layout[, 1])
#> old | new
#> [1] 1 | 1 [1]
#> [2] 0.809016994374947 - -1 [2]
#> [3] 0.309016994374947 -
#> [4] -0.309016994374947 -
#> [5] -0.809016994374947 -
#> [6] -1 -
#> [7] -0.809016994374947 -
#> [8] -0.309016994374948 -
#> [9] 0.309016994374947 -
#> [10] 0.809016994374947 -
waldo::compare(sg$layout[, 2], gs$layout[, 2])
#> old | new
#> [1] 0 | 0 [1]
#> [2] 0.587785252292473 - 1.22464679914735e-16 [2]
#> [3] 0.951056516295154 -
#> [4] 0.951056516295154 -
#> [5] 0.587785252292473 -
#> [6] 1.22464679914735e-16 -
#> [7] -0.587785252292473 -
#> [8] -0.951056516295154 -
#> [9] -0.951056516295154 -
#> [10] -0.587785252292473 -
Created on 2024-03-03 with reprex v2.1.0
The problem is that layout
is stored as a graph attribute where it should probably be a vertex attribute instead. Matrices with m rows and n columns are treated as vectors of length m by the vctrs framework, it would be a good fit. We now only need to teach functions to preferentially look for a vertex attribute "layout"
, and perhaps to warn when "layout"
is assigned as a graph attribute.
Vertex attributes x
and y
are already treated this way if a graph-level layout
attribute is not present. See the source code of layout_nicely
-- if there is no graph attribute called layout
, cbind(V(graph)$x, V(graph)$y)
is used to create a layout matrix.
A vertex attribute layout
could be added to that logic then. We'd have to see if matrices can be used as vectors in Arrow though, for moving attributes to the core. Currently:
m <- matrix(1:6, ncol = 3)
data <- tibble::tibble(id = 1:2, m)
data
#> # A tibble: 2 × 2
#> id m[,1] [,2] [,3]
#> <int> <int> <int> <int>
#> 1 1 1 3 5
#> 2 2 2 4 6
arrow::as_arrow_table(data)
#> Error: Invalid: All columns must have the same length
arrow::as_arrow_array(data)
#> Error: Invalid: All arrays must have the same length
Created on 2024-03-04 with reprex v2.1.0
@paleolimbot: Does Arrow support matrix columns in principle? If yes, what could the support in R look like? What about Python and other systems?
Can't we simply keep on using $x
and $y
instead of adding support for $layout
on the vertex level? (Note that there's also an optional $z
vertex attribute that is used for 3D layouts).
Does Arrow support matrix columns in principle? If yes, what could the support in R look like? What about Python and other systems?
Yes, there are "fixed-size tensor" and "variable-size tensor" extension types: https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor , although neither are supported in the R bindings or nanoarrow (or: there is no built-in conversion to R objects). I prototyped it here when the discussion was ongoing: https://gist.github.com/paleolimbot/c42f068c2b8b98255dbfbe379d905607 .
I am pretty sure that in Python/pyarrow it has built-in/zero-copy convert to numpy. It could in theory be wired up in nanoarrow...I'm in the middle of a refactor to make the conversion process understandable here: https://github.com/apache/arrow-nanoarrow/pull/392 , and after that's done it should be more straightforward how to contribute that.
Thanks, @paleolimbot. Looks like we'll stay with plain vector attributes for now.
Note that vertices 1 and 2 are plotted multiple times. This is because the layout vector of the original graph is copied to the sub graph.
Version information