thomasp85 / tidygraph

A tidy API for graph manipulation
https://tidygraph.data-imaginist.com
Other
546 stars 61 forks source link

`as_tbl_graph`: Unexpected mismatch between names and internal ID #147

Closed psychNerdJae closed 11 months ago

psychNerdJae commented 3 years ago

When as_tbl_graph is given an edgelist where from and to are integers, it will create a tbl_graph including a node column for name. The issue is that the internal ID can unexpectedly differ from the assigned/inferred name.

I discovered this unexpected behavior when trying to figure out why igraph::random_walk was giving me sequences that included transitions between nodes that don't actually share an edge.

I'm using tidygraph 1.2.0 and igraph 1.2.6, in case that helps triangulate the underlying issue.

`%>%` <- magrittr::`%>%`

# Zachary's karate club
karate_club <- tibble::tibble(
  from = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
           1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 
           4, 5, 5, 6, 6, 6, 7, 9, 9, 9, 10, 14, 15, 15, 16, 16, 19, 19, 
           20, 21, 21, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 26, 27, 27, 
           28, 29, 29, 30, 30, 31, 31, 32, 32, 33),
  to = c(2, 3, 4, 5, 6, 
         7, 8, 9, 11, 12, 13, 14, 18, 20, 22, 32, 3, 4, 8, 14, 18, 20, 
         22, 31, 4, 8, 9, 10, 14, 28, 29, 33, 8, 13, 14, 7, 11, 7, 11, 
         17, 17, 31, 33, 34, 34, 34, 33, 34, 33, 34, 33, 34, 34, 33, 34, 
         33, 34, 26, 28, 30, 33, 34, 26, 28, 32, 32, 30, 34, 34, 32, 34, 
         33, 34, 33, 34, 33, 34, 34)
)

# Unexpected output, where the "name" of the node does not match its internal ID
tidygraph::as_tbl_graph(karate_club) %>%
  igraph::V()

# my output from ^
# + 34/34 vertices, named, from f3fcd80:
# [1] 1  2  3  4  5  6  7  9  10 14 15 16 19 20 21 23 24 25 26 27 28 29 30 31 32 33 8  11 12 13 18 22 17 34

tidygraph::as_tbl_graph(karate_club) %>%
  mutate(internal_id = dplyr::row_number()) %>%
  as_tibble() %>%
  filter(name != internal_id)

# my output from ^
# A tibble: 26 × 2
#   name  internal_id
#   <chr>       <int>
# 1 9               8
# 2 10              9
# 3 14             10
# 4 15             11
# 5 16             12
# 6 19             13
# 7 20             14
# 8 21             15
# 9 23             16
#10 24             17
# … with 16 more rows

# Output is as expected, because names were not assigned
tidygraph::tbl_graph(edges = karate_club) %>%
  igraph::V()

# my output from ^
# + 34/34 vertices, from df507e8:
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
mkoohafkan commented 2 years ago

Does this still happen if you construct karate_club "from" and "to" columns with integers (rather than numerics)?

psychNerdJae commented 2 years ago

Seemingly yes. I'm not well-versed in how/whether dataframe construction leads to implicit type changes, but this is my best go at it.

`%>%` <- magrittr::`%>%`

from <- c(1L, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
          1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 
          4, 5, 5, 6, 6, 6, 7, 9, 9, 9, 10, 14, 15, 15, 16, 16, 19, 19, 
          20, 21, 21, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 26, 27, 27, 
          28, 29, 29, 30, 30, 31, 31, 32, 32, 33) %>%
  as.integer()

to <- c(2L, 3, 4, 5, 6, 
        7, 8, 9, 11, 12, 13, 14, 18, 20, 22, 32, 3, 4, 8, 14, 18, 20, 
        22, 31, 4, 8, 9, 10, 14, 28, 29, 33, 8, 13, 14, 7, 11, 7, 11, 
        17, 17, 31, 33, 34, 34, 34, 33, 34, 33, 34, 33, 34, 34, 33, 34, 
        33, 34, 26, 28, 30, 33, 34, 26, 28, 32, 32, 30, 34, 34, 32, 34, 
        33, 34, 33, 34, 33, 34, 34) %>%
  as.integer()

karate_club <- data.frame(from, to) %>%
  dplyr::mutate(across(everything(), as.integer))

tidygraph::as_tbl_graph(karate_club) %>%
  igraph::V()

# output from ^
+ 34/34 vertices, named, from 82fd2ab:
 [1] 1  2  3  4  5  6  7  9  10 14 15 16 19 20 21 23 24 25 26 27 28 29 30 31 32 33 8  11 12 13 18 22 17 34