thomasp85 / tidygraph

A tidy API for graph manipulation
https://tidygraph.data-imaginist.com
Other
546 stars 61 forks source link

Regression in edgelist import with as_tbl_graph #192

Closed trifle closed 6 months ago

trifle commented 6 months ago

Hi, thanks for the wonderful tidygraph package.

I recently ran some code that was about a year old, and it ran into errors because tidygraph current (1.3.1) changed the behavior of as_tbl_graph() to no longer read name attributes from to and from columns of edgelists.

What seemed weird to me is that tbl_graph() works completely differently, reading the edgelist as node attributes.

# No names
> edges %>% as_tbl_graph()
# A tbl_graph: 290 nodes and 41905 edges
#
# A directed acyclic simple graph with 1 component
#
# Node Data: 290 × 0 (active)
#
# Edge Data: 41,905 × 3
   from    to  weight
  <int> <int>   <dbl>
1     1     2 0.00968

> edges %>% tbl_graph()
# A tbl_graph: 41905 nodes and 0 edges
#
# A rooted forest with 41905 trees
#
# Node Data: 41,905 × 3 (active)
       from        to  weight
      <dbl>     <dbl>   <dbl>
 1 112358  853211 0.00968
 2 112358 989898 0.00565

If this is intentional then I'm sorry for the superfluous report, but the changelog read to me as if there were some fixes lately related to similar issues.

trifle commented 6 months ago

So this was just a fluke from the edgelist which invalidated a join. Sorry for the noise!

The issue was that in order for vertex attributes to be set from an edgelist, they need to be character type, not dbl. This is documented in https://tidygraph.data-imaginist.com/reference/tbl_graph.html:

A data.frame containing information about the nodes in the graph. If edges$to and/or edges$from are characters then they will be matched to the column named according to node_key in nodes, if it exists. If not, they will be matched to the first column.

Because I had persisted my edgelist to csv and my node names are numerical IDs, the columns were autocast to dbl.