thomasp85 / tidygraph

A tidy API for graph manipulation
https://tidygraph.data-imaginist.com
Other
546 stars 61 forks source link

tbl_graph doesn't work as expected #89

Closed fruchtblase closed 4 years ago

fruchtblase commented 5 years ago

Hey,

I ran into an issue when creating graphs from dataframes

creating a graph with tbl_grap results an an error tbl_graph(nodes = nodes, edges = edgelist)

Error in (function (edges, n = max(edges), directed = TRUE) : At type_indexededgelist.c:117 : cannot create empty graph with negative number of vertices, Invalid value

but doing the same with igraph directly works flawlessly: igraph::graph_from_data_frame(edgelist, vertices = nodes) %>% as_tbl_graph()

R version 3.5.2 (2018-12-20) Platform: x86_64-w64-mingw32/x64 (64-bit) tidygraph_1.1.2

somebody on stackoverflow had a similar problem

harryprince commented 5 years ago

please provide a reproducible example by reprex package

fruchtblase commented 5 years ago

Here is the reprex! Btw, the error doesn't show up if you drop the name column in the node dataframe.

library(tidyverse)
library(tidygraph)
#> 
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#> 
#>     filter

data <- jsonlite::fromJSON(url("https://api.semanticscholar.org/v1/author/1741101"))
nodes <- tibble(id = data$papers$paperId,
                    name = data$papers$title)
n <- dim(nodes)[1]

edges <- tibble(from = nodes$id[sample.int(n = n,100)],
                    to = nodes$id[sample.int(n = n,100)])

graph <- tbl_graph(nodes = nodes, edges = edges)
#> Error in (function (edges, n = max(edges), directed = TRUE) : At type_indexededgelist.c:117 : cannot create empty graph with negative number of vertices, Invalid value

graph <- igraph::graph_from_data_frame(edges, vertices = nodes) %>% as_tbl_graph()

Created on 2019-03-22 by the reprex package (v0.2.1)

harryprince commented 5 years ago

@fruchtblase this code works well on my Mac:

> graph
# A tbl_graph: 270 nodes and 100 edges
#
# A directed acyclic simple graph with 170 components
#
# Node Data: 270 x 2 (active)
  id                          name                                              
  <chr>                       <chr>                                             
1 501428daffd5d70d1305582dde… Self-supervised Relation Extraction from the Web  
2 869412b3f3b6b1bd40c014cf9d… A search engine for large - corpus language appli…
3 01d711358705c09656c4deb5cd… Adaptive Web Sites: Conceptual Cluster Mining     
4 3acbc5ee8fcd7c559458ac6318… KnowItNow: Fast, Scalable Information Extraction …
5 95a7442af05b03187dddba2430… Method and apparatus for accessing on-line stores 
6 a4b465f0d837cf9bbe64f0c1e7… Learning to Understand Information on the Interne…
# … with 264 more rows
#
# Edge Data: 100 x 2
   from    to
  <int> <int>
1     1     2
2     3     4
3     5     6
# … with 97 more rows

Here is my package information:

Package: tidygraph
Type: Package
Title: A Tidy API for Graph Manipulation
Version: 1.0.0
Date: 2017-07-06
Author: Thomas Lin Pedersen
Maintainer: Thomas Lin Pedersen <thomasp85@gmail.com>
Description: A graph, while not "tidy" in itself, can be thought of as
        two tidy data frames describing node and edge data
        respectively. 'tidygraph' provides an approach to manipulate
        these two virtual data frames using the API defined in the
        'dplyr' package, as well as provides tidy interfaces to a lot
        of common graph algorithms.
License: GPL (>= 2)
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Imports: tibble, dplyr (>= 0.7), igraph, magrittr, utils, rlang, R6,
        Rcpp, tools, stats, tidyr
URL: https://github.com/thomasp85/tidygraph
BugReports: https://github.com/thomasp85/tidygraph/issues
LinkingTo: Rcpp
Suggests: network, data.tree, ape, graph, methods, testthat, covr
NeedsCompilation: yes
Packaged: 2017-07-06 11:40:22 UTC; thomas
Repository: CRAN
Date/Publication: 2017-07-07 05:01:15 UTC
Built: R 3.4.1; x86_64-apple-darwin15.6.0; 2017-07-25 17:51:07 UTC;
        unix
fruchtblase commented 5 years ago

interesting, when I downgraded to tidygraph 1.0.0 the code worked on my machine as well! looks like this bug is only present in versions >= 1.1.0

harryprince commented 5 years ago

@thomasp85 might be aware of this issue.

aterhorst commented 5 years ago

I experienced same issue. I renamed in my node list column name = 'name' to another column name and hey presto, all is good. It seems tbl_graph does not like node list to have columns labeled 'name'. BTW, my original node list has two columns, id = integer, name = character. I am using tidygraph version .1.1.12 on a Mac.

aterhorst commented 5 years ago

BTW, I was the person who reported the issue on https://stackoverflow.com/questions/50457926/tidygraph-and-igraph-build-graph-from-dataframe-discrepancy.

This is a different issue.

thomasp85 commented 4 years ago

The issue is that igraph treats a node attribute named 'name' special and will try to match edges to that column if the edges are given as characters. The change was introduced because of reports that tidygraph did not construct the same graphs as igraph, but it seems that it was never documented. I'll add documentation

thomasp85 commented 4 years ago

In addition to better documentation you can now also control which column is used for matching a character edgelist to nodes. It default to the old behaviour (i.e. using the name column)