thomasp85 / tidygraph

A tidy API for graph manipulation
https://tidygraph.data-imaginist.com
Other
546 stars 61 forks source link

node_distance_to/from returns incorrect #112

Closed mkirzon closed 11 months ago

mkirzon commented 4 years ago

I'm finding that the tidygraph node_distance_to is returning incorrect results, even though it should be just a wrapper around igraph::distances. Here's an example:

library(tidyverse)
library(tidygraph)
library(igraph)

df_nodes = tribble(
  ~name, ~MN,
  "a", 1, 
  "b", 2, 
  "c", 3, 
  "d", 1, 
  "e", 2, 
  "f", 3, 
  "g", 4,
  "h", 5, 
  "i", 6,
  "z", 7)

df_edges = tribble(
  ~from, ~to, 
  "b", "a", 
  "c", "a", 
  "z", "a", 
  "e", "d", 
  "f", "d", 
  "z", "e", 
  "h", "g",
  "i", "g")

# igraph type
gi = graph_from_data_frame(vertices = df_nodes, d = df_edges, directed = TRUE)

# tidygraph type
gt = as_tbl_graph(gi)

# Plot 
plot(gt)

We want to calculate the distance from all nodes to 'a' and 'd' (which will be ids 1 and 4, respectively)

Here's the result from igraph

distances(gi, to = c("a", "d"))
#>     a   d
#> a   0   3
#> b   1   4
#> c   1   4
#> d   3   0
#> e   2   1
#> f   4   1
#> g Inf Inf
#> h Inf Inf
#> i Inf Inf
#> z   1   2

Here's the result from tidygraph

gt %>%
  activate(nodes) %>% 
  mutate(distTo = node_distance_to(nodes = c(1,4), mode = "all", algorithm = "automatic")) %>% 
  as_tibble() %>% 
  print()
#> # A tibble: 10 x 3
#>    name     MN distTo
#>    <chr> <dbl>  <dbl>
#>  1 a         1      0
#>  2 b         2      4
#>  3 c         3      1
#>  4 d         1      0
#>  5 e         2      2
#>  6 f         3      1
#>  7 g         4    Inf
#>  8 h         5    Inf
#>  9 i         6    Inf
#> 10 z         7      2

A clear example of the issue is in looking at the distance from nodes 'b' and 'c'. These should really be identical but tidygraph gives distances 4 and 1, respectively.

mbojan commented 4 years ago

@mkirzon I believe supplying more than one node to nodes argument of node_distance_to() does not do what you think. These "target" nodes get recycled, so the first entry in the result is the distance to "a", the second to "d", the third to "a" and so on. You can compare to igraph results and see that this is the case. You probably need:

gt %>%
  activate(nodes) %>% 
  mutate(
    distToA = node_distance_to(nodes = 1, mode = "all", algorithm = "automatic"),
    distToD = node_distance_to(nodes = 4, mode = "all", algorithm = "automatic")
  ) %>% 
  as_tibble()
mkirzon commented 4 years ago

@mbojan Wow, thank you. I entirely missed the middle column for igraph. Is there a more efficient tidygraph way then to generate the distance matrix that's returned by igraph::distances? I'm building out a set of tools in which we would often expect 100's or 1000's of target nodes.

I was using node_distance_* to help filter the graph to the subset of all paths that contain at least one of the N target nodes. Open to other ideas :) Though sorry if this should be taken to stackoverflow

mbojan commented 4 years ago

What's wrong with using igraph::distances() directly? No wrapper, however tidy, will be more efficient. If you need a dataframe output you may e.g.:

gt %>%  # or `gi`, does not matter
  distances(to = c("a", "d")) %>%
  as.data.frame(stringsAsFactors=FALSE) %>%
  rownames_to_column(var = "from")
mbojan commented 4 years ago

Sorry, missed this:

I was using node_distance_* to help filter the graph to the subset of all paths that contain at least one of the N target nodes. Open to other ideas :) Though sorry if this should be taken to stackoverflow

Does that mean that you basically want a union of two minimal spanning trees rooted at "a" and "d" respectively?

Steviey commented 1 year ago

+1 Invalid vertex name(s)