thomasp85 / tidygraph

A tidy API for graph manipulation
https://tidygraph.data-imaginist.com
Other
546 stars 61 forks source link

Creating a tbl_graph from data frames with sticky columns #184

Closed luukvdmeer closed 10 months ago

luukvdmeer commented 10 months ago

A tbl_graph can be created by providing two data frames, one for the nodes and one for the edges. However, when the edges data frame has a "sticky column" (meaning that this column is resilient to subsetting operations like select) this process fails.

Sticky columns are a thing for example in the sf package for spatial data, where the geometry column always remains after column subsetting. I know there are other packages using this logic as well, such as tsibble for time series data. Currently we cannot create tbl_graph objects directly from such data frames.

library(tidygraph)
library(sf)
library(tidyverse)

points = c(st_sfc(st_point(c(0,0))), st_sfc(st_point(c(1,1))))
nodes = st_as_sf(tibble(id = c(1,2), geometry = points))
lines = c(st_sfc(st_linestring(matrix(c(0,0,1,1), 2, 2))), st_sfc(st_linestring(matrix(c(1,1,0,0), 2, 2))))
edges = st_as_sf(tibble(from = c(1,2), to = c(2,1), geometry = lines))

tbl_graph(nodes, edges)
Error in graph_from_edgelist(as.matrix(edges[, 1:2]), directed = directed) : 
  graph_from_edgelist expects a matrix with two columns

The problem is in as.matrix(edges[, 1:2]), which in normal cases results in a two-column matrix, but when a sticky column is present it will become a three-column matrix because the sticky column is resilient to the subsetting operation.

https://github.com/thomasp85/tidygraph/blob/d88b167110a35b552212be53d18340d4b03f31a5/R/list.R#L89

An easy-fix would be to subset the matrix again, after subsetting the data frame:

as.matrix(edges[, 1:2])[, 1:2]

This would cover all cases where a sticky column would be present and not harm regular cases. But maybe there are more elegant solutions.