igraph / rigraph

igraph R package
https://r.igraph.org
549 stars 200 forks source link

Finding just one of the largest ivs? #471

Closed privefl closed 3 years ago

privefl commented 3 years ago

I have an undirected graph with 15,000 vertices, but only 21,000 edges between them. I want to find just one of the largest IVS.

I tried using igraph::largest_ivs() but ran out of memory (> 128 GB). I guess there are too many equal solutions and it tries to return them all, but I'm interested in getting only one.

Is there a way to get just one solution efficiently?

Thanks

privefl commented 3 years ago

I'm now trying some much smaller graph: 121 vertices with 92 edges. igraph::independence.number() is taking a very long time; is that normal?

ntamas commented 3 years ago

Yes, I think that's normal The largest independent vertex set problem is NP-hard so it's unlikely that you will be able to find an exact solution for a graph with 15K vertices. Finding the independence number on an Erdos-Renyi random graph with 121 vertices and only 50 edges takes 12.8s on my machine, and it is going to scale exponentially with the number of vertices. For 70 edges, you need 108.6s on the same machine.

You are probably better off with a simple greedy approximation algorithm; e.g., iteratively take the vertex with the smallest degree, add it to the independent set, remove its neighbors from consideration, and proceed until there are no more vertices left. This achieves an approximation ratio of (maxdegree + 2) / 3 (see here).

privefl commented 3 years ago

Thanks for the answer. I know this is a very hard problem. But like other problems of this type, there usually exists some solution using efficient branch-and-bound or dynamic programming to get a solution more efficiently. Given that it is easy to find a good-enough greedy solution, I was expecting a branch-and-bound solution to be efficient.

ntamas commented 3 years ago

I looked into the implementation of independent vertex sets in the C core of igraph and it seems like there is definitely some room for improvement there. The current implementation simply takes the complementer of the graph and looks for the largest cliques in the complementer using a backtracking algorithm. This is clearly infeasible for large graphs due to the memory requirement of the complementer. There is also an implementation of maximal independent vertex sets, which does not take the complementer, but it is prohibitive due to the large amount of maximal independent vertex sets in large graphs, and the current implementation does not provide a way for the user to either use a size limit or a callback function that is invoked for every maximal independent vertex set that was found.

There's also a promising algorithm on arXiv that is probably worth looking into for potential inclusion into the C core, but as the C core is going through a significant refactoring now (transitioning to 64-bit integers for vertex and edge IDs everywhere), it will have to wait until we are done with the refactoring. Nevertheless, feel free to submit a feature request in the repo of the C core.

privefl commented 3 years ago

Thanks for looking at this.

I have moved to some greedy solution, as I need a working solution immediately.