igraph / rigraph

igraph R package
https://r.igraph.org
544 stars 200 forks source link

as_adj_list Inefficient Performance #276

Closed DarioS closed 1 year ago

DarioS commented 6 years ago

I have a simulated two-column edge matrix with 100000 rows. It's intended to be representative of the Human Reference Interactome in its size. The task I'm attempting to do is find all of the interactors of vertices with a degree above a threshold (e.g. at least 5 vertices).

Calculating the degree of each vertex is fast.

interactor <- sapply(1:100000, function(index)
                       paste(c(sample(LETTERS, 3), sample(1:9, 1)), collapse = ''))
otherInteractor <- sapply(1:100000, function(index)
                       paste(c(sample(LETTERS, 3), sample(1:9, 1)), collapse = ''))
edges <- matrix(c(interactor, otherInteractor), ncol = 2)
aGraph <- graph_from_edgelist(edges, directed = FALSE)
> aGraph
IGRAPH b6eed2e UN-- 106530 1e+05 -- 
+ attr: name (v/c)
> system.time(degrees <- degree(aGraph))
   user  system elapsed 
  0.019   0.000   0.018

However, as_adj_list takes a long time.

> system.time(interactors <- as_adj_list(aGraph))
    user   system  elapsed 
1291.481   16.608 1309.061 

I wrote a function that does the same task which takes 2.5 seconds.

edgesToHubNetworks <- function(edges, minCardinality = 5)
{
  allFeatures <- unique(as.vector(edges))
  featuresByFirstEdge <- split(edges[, 2], factor(edges[, 1], levels = allFeatures))
  featuresBySecondEdge <- split(edges[, 1], factor(edges[, 2], levels = allFeatures))
  featuresByHub <- mapply(c, featuresByFirstEdge, featuresBySecondEdge, SIMPLIFY = FALSE)
  featuresByHub <- lapply(featuresByHub, unique)
  featuresByHub <- featuresByHub[sapply(featuresByHub, length) >= minCardinality]
  featuresByHub
}

> system.time(subNetworks <- edgesToHubNetworks(edges))
   user  system elapsed 
  2.404   0.012   2.417 
> head(subNetworks)

Are there performance optimisations which could be made to as_adj_list ? Also, the function returns "a list of numeric vectors" whereas some other functions, such as groups returns "A named list of numeric or character vectors". Could they have a consistent return type?

ntamas commented 1 year ago

This issue is now solved, although I don't know which specific change we've made in the past solved this. I remember there were performance problems with creating igraph vertex sequence objects.

This is what I get on my machine now:

> system.time(interactors <- as_adj_list(aGraph))
   user  system elapsed
  1.316   0.008   1.324