igraph / rigraph

igraph R package
https://r.igraph.org
554 stars 200 forks source link

Add a convenience function to return the largest connected component of a graph #785

Closed ngmaclaren closed 1 year ago

ngmaclaren commented 1 year ago

What is the feature or improvement you would like to see? A function that takes a graph with possibly many components and returns an induced subgraph that is the largest connected component of the original graph.

Use cases for the feature I routinely need to extract the largest connected component of a graph (or largest weakly connected component of a directed graph) for further analysis. This use case comes up both in sampling graphs from an ensemble (e.g., when the sampling algorithm has generated isolated nodes that I want to ignore) and in working with empirical networks (e.g., when a data set contains information on microscopic clusters of nodes that are not critical for understanding the primary network). I have a function written that does this (modified from a StackOverflow post, link and function below), but an official version in igraph would be better if there's a general interest.

References I wrote the below function modified from this StackOverflow post. It seems to work well enough, but it probably isn't as complete as it could be. I'd be happy try to improve it if there are some things it needs to be up to standard.

get_gcc <- function(g) {
    if(is_directed(g)) {
        comps <- components(g, mode = "weak")
    } else comps <- components(g)

    gcc_id <- which.max(comps$csize)
    vids <- V(g)[comps$membership == gcc_id]
    g <- induced_subgraph(g, vids)
    return(g)
}
szhorvat commented 1 year ago

This request makes a lot of sense. Both the Python and Mathematica interfaces of igraph have such a function, as this is indeed a very common task.

Since this is relatively simple to implement, would you like to open a pull request? I would suggest including a mode argument as well, and writing up some documentation.

ngmaclaren commented 1 year ago

Sure, I will give it a shot. Thanks!

ngmaclaren commented 1 year ago

Sorry to bug you with trivial things, but I'm having some trouble going the steps outlined in CONTRIBUTING.md.

I have adjusted the function to include the mode argument and the roxygen2-style documentation. However, I don't seem to be able to run testthat::test_local() without error. I think the problem is with the testing, not the changes I made. Details are below, but the bottom line question is: do you need me to run these error messages down, or shall I open the pull request? Is there something I'm missing in the build/test process?

Details

Here's the function, which I put in my forked version of ./rigraph/R/components.R:

#' Find the largest connected component of a graph
#'
#' This function returns the largest connected component of a graph. In case of
#' a tie, the first component by vertex order is returned. Vertex IDs from
#' the original graph are not retained in the returned graph.
#' 
## #' @rdname components
## #' @family components
#' @param graph The original graph.
#' @param mode Passed to `components()`. Ignored if `graph` is undirected.
#' @returns The largest connected component of the graph.
#' @seealso [components()], [induced_subgraph()]
#' @export
#'
#' @examples
#' g <- sample_gnp(8, .2, directed = TRUE)
#' V(g)$color <- 1:N
#' plot(g)
#' plot(largest_component(g))
#' plot(largest_component(g, mode = "strong"))
largest_component <- function(graph, mode = c("weak", "strong")) {
  if (!is_igraph(graph)) {
    stop("Not a graph object")
  }

  if (is_directed(graph)) {
    comps <- components(graph, mode = mode)
  } else comps <- components(graph)

  lcc_id <- which.max(comps$csize)
  vids <- V(graph)[comps$membership == lcc_id]

  induced_subgraph(graph, vids)
}

I ran make igraph in my local ./rigraph directory:

tools/builddocs.sh
ℹ Updating igraph documentation
ℹ Loading igraph
Warning: Failed to load at least one DLL.
Caused by error in `getDLLRegisteredRoutines.DLLInfo()`:
! must specify DLL via a “DLLInfo” object. See getLoadedDLLs()
Writing NAMESPACE
Writing largest_component.Rd
Rscript -e 'devtools::build(path = ".")'
── R CMD build ─────────────────────────────────────────────────────────────────
✔  checking for file ‘/home/neil/Documents/testing/rigraph/DESCRIPTION’ ...
─  preparing ‘igraph’:
✔  checking DESCRIPTION meta-information ...
─  cleaning src
─  running ‘cleanup’
─  installing the package to build vignettes (670ms)
✔  creating vignettes (4m 32.5s)
─  cleaning src
─  running ‘cleanup’
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories (404ms)
─  building ‘igraph_1.4.99.9005.tar.gz’

[1] "./igraph_1.4.99.9005.tar.gz"

The documentation in ./rigraph/man/ appears to be correct, so I think the package built. However, in a clean R session I get this:

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> setwd('/home/neil/Documents/testing/rigraph/')
> testthat::test_local()
Starting 2 test processes
✔ | F W S  OK | Context
⠸ [ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ] Starting up...                          

Error in `private$handle_error()`:
! testthat subprocess failed to start, stderr:
Error in `(function (command = NULL, args = character(), error_on_status = TRUE, …`:
! System command 'R' failed
---
Exit status: 1
Stdout & stderr:
* installing *source* package ‘igraph’ ...
** using staged installation
rm: missing operand
Try 'rm --help' for more information.
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/home/neil/Documents/testing/rigraph':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
ERROR: configuration failed for package ‘igraph’
* removing ‘/tmp/Rtmp0K6ohS/devtools_install_182ed56bbaea7/igraph’
---
Backtrace:
 1. global callr_startup_hook()
 2. pkgload::load_all("/home/neil/Documents/testing/rigraph/tests/testthat", …
 3. pkgbuild::compile_dll(path, quiet = quiet)
 4. pkgbuild:::install_min(path, dest = install_dir, components = "libs", args = if (need…
 5. pkgbuild::rcmd_build_tools("INSTALL", c(path, paste("--library=", dest, …
 6. pkgbuild::with_build_tools({ …
 7. base::withCallingHandlers(callr::rcmd_safe(..., env = env, spinner = FALSE, …
 8. callr::rcmd_safe(..., env = env, spinner = FALSE, show = FALSE, …
 9. callr:::run_r(options)
10. base::with(options, with_envvar(env, do.call(processx::run, c(list(bin, …
11. base::with.default(options, with_envvar(env, do.call(processx::run, …
12. base::eval(substitute(expr), data, enclos = parent.frame())
13. base::eval(substitute(expr), data, enclos = parent.frame())
14. callr:::with_envvar(env, do.call(processx::run, c(list(bin, args = real_cmdargs, …
15. base::force(code)
16. base::do.call(processx::run, c(list(bin, args = real_cmdargs, stdout_line_callba…
17. (function (command = NULL, args = character(), error_on_status = TRUE, …
18. base::throw(new_process_error(res, call = sys.call(), echo = echo, …
19. | base::signalCondition(cond)
20. (function (e) …
21. asNamespace("callr")$err$throw(e)
Caused by error:
! R session crashed with exit code 1
Run `rlang::last_trace()` to see where the error occurred.
> 

I have gcc-12.2.1-2 installed.

krlmlr commented 1 year ago

Thanks. CONTRIBUTING.md is slightly out of date, that's on me.

The following steps should work:

Can you confirm?

ngmaclaren commented 1 year ago

That did the trick: I ran testthat::test_local() without errors. Here are the results:

── Skipped tests  ───────────────────────────────────────────────
• getRversion() < 4.3 is TRUE (1)
• nested igraph call handling not implemented yet (1)
• No graph package (2)

[ FAIL 0 | WARN 0 | SKIP 4 | PASS 4090 ]

Now with a fresh R session I can use my function, but I can't load the man page. I've never used roxygen2 before, so maybe I did something wrong there?

Here's the session:


> pkgload::load_all()
ℹ Loading igraph
> g <- sample_gnp(100, .025)
> h <- largest_component(g)
> g
IGRAPH 0f6b634 U--- 100 113 -- Erdos-Renyi (gnp) graph
+ attr: name (g/c), type (g/c), loops (g/l), p (g/n)
+ edges from 0f6b634:
 [1]  1--12  6--14 13--22 10--24 19--29  2--31 32--34  5--35 22--36  6--37
[11]  7--39 12--39 21--39  2--42 13--42 10--45 12--47 20--47 27--47 37--49
[21] 38--50 40--51 43--51 35--53  1--55 10--55 28--55 25--56  7--57 28--57
[31] 10--58 35--58 47--59 44--60 11--61 43--61 13--62  4--63 53--63 61--64
[41] 23--66 47--66 65--67 66--67 30--68 34--68  8--69 49--69 60--69 22--70
[51] 28--70 49--70 64--70  3--72  6--74 14--75 34--75 50--75 64--75 14--76
[61] 20--77 60--77 28--78 68--78 64--79 69--80 70--80 12--81  7--82 28--82
[71] 39--83 63--83 40--84 54--84 58--84 67--84 81--85 37--86 64--86  6--87
+ ... omitted several edges
> h
IGRAPH 573f58d U--- 82 109 -- Erdos-Renyi (gnp) graph
+ attr: name (g/c), type (g/c), loops (g/l), p (g/n)
+ edges from 573f58d:
 [1]  1--10  5--12 11--15  8--17  2--22 23--24  4--25 15--26  5--27  6--29
[11] 10--29 14--29  2--31 11--31  8--34 10--36 13--36 19--36 27--37 28--38
[21] 30--39 32--39 25--40  1--42  8--42 20--42 18--43  6--44 20--44  8--45
[31] 25--45 36--46 33--47  9--48 32--48 11--49  3--50 40--50 48--51 16--53
[41] 36--53 52--54 53--54 21--55 24--55  7--56 37--56 47--56 15--57 20--57
[51] 37--57 51--57  5--59 12--60 24--60 38--60 51--60 12--61 13--62 47--62
[61] 20--63 55--63 51--64 56--65 57--65 10--66  6--67 20--67 29--68 50--68
[71] 30--69 41--69 45--69 54--69 66--70 27--71 51--71  5--72 27--72 59--72
+ ... omitted several edges
> ?largest_component
No documentation for ‘largest_component’ in specified packages and libraries:
you could try ‘??largest_component’
krlmlr commented 1 year ago

You need devtools::document() .

Would you like to improve CONTRIBUTING.md while we're at it? Perhaps @maelle can help too.

ngmaclaren commented 1 year ago

There it is! Worked.

Sure, I can give it a try. Shall I send this pull request, then do a separate issue and pull request for CONTRIBUTING.md? I don't really know how pull requests work. I guess I send the draft changes, then you all review it on your end before accepting the changes?

krlmlr commented 1 year ago

Yes, please. Focused pull requests are best. We review and merge when good (or good enough). We're experimenting with optimistic merging, and are also happy to provide feedback if it helps you.

https://dmerej.info/blog/post/optimistic-merging/

krlmlr commented 1 year ago
  1. If not on a branch yet, please switch to a branch and bring in the changes
  2. Ensure the branch is based on the latest main
  3. Push the branch to your fork
  4. Open a PR
  5. Create a second branch from the main branch, switch to it
  6. Edit CONTRIBUTING.md
  7. Open a second PR

Does that make sense?

ngmaclaren commented 1 year ago

Ha! Nope! But I'll figure it out. I think I skipped the branch part earlier, because locally git says I'm working on main. Can I just name my branch dev or should I name it something unique to me?

ngmaclaren commented 1 year ago

Never mind: reading the blog post and clicking around on GitHub I think I'm tracking now. Let me try to submit this PR on my own branch like you say and then move on to the CONTRIBUTING.md issue. This will take me a few minutes at least.

krlmlr commented 1 year ago

Take your time, thanks for pushing through!

ngmaclaren commented 1 year ago

Definitely! I use igraph pretty much every day, so hopefully this helps, even if only a little. I definitely appreciate all the time you all put into igraph.

ngmaclaren commented 1 year ago

Ok, I think I've submitted the pull request. I made it a draft pull request to hopefully minimize any trouble if I've done something wrong. The branch should be called neil-lccfunc and the only changes should be to ./rigraph/R/components.R. To make the pull request, I wound up deleting and recreating my fork, then cloned the fresh copy. I made my changes without compiling, then committed the changes to a new branch. I pushed that branch to GitHub. GitHub prompted me to make the pull request, and that pull request was already set up to go to igraph/rigraph instead of ngmaclaren/rigraph. So that's where we're at. If this works, I'll start working on CONTRIBUTING.md.

maelle commented 1 year ago

If this works, I'll start working on CONTRIBUTING.md.

Thanks so much! Your feedback as a first-time contributor is extremely valuable.

Regarding pull requests, maybe https://usethis.r-lib.org/articles/pr-functions.html can be useful if you use usethis (and if you don't use usethis, maybe it can suit your workflow preferences).

ngmaclaren commented 1 year ago

Thanks! I will check it out.

github-actions[bot] commented 5 months ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.