Disturbed layout in subgraph

clpippel commented 1 year ago

Consider the code below.

## bug / undesired / unexpected behaviour.
g        <- make_ring(10)
g$layout <- layout_in_circle(g)
sg       <- subgraph(g, c(1,2))
## delete_graph_attr(sg, "layout")
plot(sg)

Note that vertices 1 and 2 are plotted multiple times. This is because the layout vector of the original graph is copied to the sub graph.

Version information

R/igraph version: [1] igraph_1.4.1

subgraph-layout-bug

ntamas commented 1 year ago

Probably a consequence of R's vector recycling behaviour?

The graph attribute named layout is special in the sense that you can assign a matrix there and igraph's layout_nicely() layout algorithm (which is the default) uses that matrix to decide where the vertices should be. We should adjust the plot.igraph method to make sure that only the first few rows of the matrix are used if the matrix has more rows than the number of vertices in the graph.

ntamas commented 1 year ago

@krlmlr Feel free to re-assign it to one of your teammates if needed.

szhorvat commented 1 year ago

clpippel commented 1 year ago

The problem is that the layout matrix of the subgraph is inherited from the original graph. To avoid this, the graph$layout attribute should not be copied to the subgraph. Or else the coordinates of the missing vertices should be removed (nicer solution).

szhorvat commented 1 year ago

graph$layout can contain different types of values. Some of them are safe to copy (layout algorithm names) and some aren't (coordinate matrix). IMO storing the coordinates as a graph attribute (and not as vertex attributes) was a mistake. This is what I tried to describe in the wiki page that summarizes potential improvements to visualization in igraph 2.

@krlmlr @vtraag These sorts of issues are why I think that visualization can't (or shouldn't) be handed off wholesale to another package in igraph 2. Part of it will always be igraph's business. The storage of coordinates or visualization methods, and the expected behaviour with various stored visualization information is just one example.

maelle commented 6 months ago

What are action points here then according to you @szhorvat? A big overhaul of how the coordinates are stored?

szhorvat commented 6 months ago

A big overhaul of how the coordinates are stored?

That's the only thing I think we can do ... will there ever be enough resources for this? I don't know.

See the old igraph 2 planning document: https://github.com/igraph/rigraph/wiki/igraph-2-planning Ambitions got deflated quite a bit since then, as it became here how much all of this would take.

maelle commented 6 months ago

Should this be closed as non planned then, as the idea is tracked in that document?

szhorvat commented 6 months ago

That document does not replace the issue tracker, and is in need of an update. It's up to Kirill to judge how big a project we can take on. There were several ideas floating around, including offloading visualization to other packages entirely (like ggraph). Personally I don't favour that, but I admit there are good reasons to go that route.

I wouldn't close this issue because it describes a very real problem that occurs frequently when working with igraph.

krlmlr commented 6 months ago

Layouting belongs in igraph, not so sure about plotting.

Reprex:

options(conflicts.policy = list(warn = FALSE))
library(igraph)

g <- make_ring(10)
g$layout <- layout_in_circle(g)
sg <- subgraph(g, c(1, 2))

gs <- make_star(2)
gs$layout <- layout_in_circle(gs)

waldo::compare(sg$layout[, 1], gs$layout[, 1])
#>      old                | new    
#>  [1] 1                  | 1   [1]
#>  [2] 0.809016994374947  - -1  [2]
#>  [3] 0.309016994374947  -        
#>  [4] -0.309016994374947 -        
#>  [5] -0.809016994374947 -        
#>  [6] -1                 -        
#>  [7] -0.809016994374947 -        
#>  [8] -0.309016994374948 -        
#>  [9] 0.309016994374947  -        
#> [10] 0.809016994374947  -
waldo::compare(sg$layout[, 2], gs$layout[, 2])
#>      old                  | new                     
#>  [1] 0                    | 0                    [1]
#>  [2] 0.587785252292473    - 1.22464679914735e-16 [2]
#>  [3] 0.951056516295154    -                         
#>  [4] 0.951056516295154    -                         
#>  [5] 0.587785252292473    -                         
#>  [6] 1.22464679914735e-16 -                         
#>  [7] -0.587785252292473   -                         
#>  [8] -0.951056516295154   -                         
#>  [9] -0.951056516295154   -                         
#> [10] -0.587785252292473   -

^{Created on 2024-03-03 with reprex v2.1.0}

The problem is that layout is stored as a graph attribute where it should probably be a vertex attribute instead. Matrices with m rows and n columns are treated as vectors of length m by the vctrs framework, it would be a good fit. We now only need to teach functions to preferentially look for a vertex attribute "layout", and perhaps to warn when "layout" is assigned as a graph attribute.

ntamas commented 6 months ago

Vertex attributes x and y are already treated this way if a graph-level layout attribute is not present. See the source code of layout_nicely -- if there is no graph attribute called layout, cbind(V(graph)$x, V(graph)$y) is used to create a layout matrix.

krlmlr commented 6 months ago

A vertex attribute layout could be added to that logic then. We'd have to see if matrices can be used as vectors in Arrow though, for moving attributes to the core. Currently:

m <- matrix(1:6, ncol = 3)

data <- tibble::tibble(id = 1:2, m)
data
#> # A tibble: 2 × 2
#>      id m[,1]  [,2]  [,3]
#>   <int> <int> <int> <int>
#> 1     1     1     3     5
#> 2     2     2     4     6

arrow::as_arrow_table(data)
#> Error: Invalid: All columns must have the same length
arrow::as_arrow_array(data)
#> Error: Invalid: All arrays must have the same length

^{Created on 2024-03-04 with reprex v2.1.0}

@paleolimbot: Does Arrow support matrix columns in principle? If yes, what could the support in R look like? What about Python and other systems?

ntamas commented 6 months ago

Can't we simply keep on using $x and $y instead of adding support for $layout on the vertex level? (Note that there's also an optional $z vertex attribute that is used for 3D layouts).

paleolimbot commented 6 months ago

Does Arrow support matrix columns in principle? If yes, what could the support in R look like? What about Python and other systems?

Yes, there are "fixed-size tensor" and "variable-size tensor" extension types: https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor , although neither are supported in the R bindings or nanoarrow (or: there is no built-in conversion to R objects). I prototyped it here when the discussion was ongoing: https://gist.github.com/paleolimbot/c42f068c2b8b98255dbfbe379d905607 .

I am pretty sure that in Python/pyarrow it has built-in/zero-copy convert to numpy. It could in theory be wired up in nanoarrow...I'm in the middle of a refactor to make the conversion process understandable here: https://github.com/apache/arrow-nanoarrow/pull/392 , and after that's done it should be more straightforward how to contribute that.

krlmlr commented 6 months ago

Thanks, @paleolimbot. Looks like we'll stay with plain vector attributes for now.

igraph / rigraph

Disturbed layout in subgraph #775