uptake / pkgnet

R package for analyzing other R packages via graph representations of their dependencies
https://uptake.github.io/pkgnet/
Other
152 stars 37 forks source link

FunctionReporter not reporting graphInCloseness and graphOutCloseness correctly #297

Closed jameslamb closed 5 months ago

jameslamb commented 1 year ago

Noticed this while trying to diagnose failing unit tests in #296.

Rscript -e "remove.packages(c('pkgnet', 'baseballstats'))"
R CMD INSTALL .
R CMD INSTALL ./inst/baseballstats

Then, in R

library(pkgnet)

reporter <- pkgnet::FunctionReporter$new()$set_package("baseballstats")

reporter$pkg_graph$graph_measures(
    reporter$pkg_graph$available_graph_measures
)
INFO [2022-10-30 21:46:42] Calculating graphOutDegree
INFO [2022-10-30 21:46:42] Calculating graphInDegree
INFO [2022-10-30 21:46:42] Calculating graphInCloseness
INFO [2022-10-30 21:46:42] Calculating graphBetweenness
$graphOutDegree
[1] 0.3

$graphInDegree
[1] 0.3

$graphOutCloseness
[1] NaN

$graphInCloseness
[1] NaN

$graphBetweenness
[1] 0.03125
output of sessionInfo() (click me) ```text R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 12.2.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] baseballstats_0.1 loaded via a namespace (and not attached): [1] igraph_1.3.5 rex_1.2.1 rstudioapi_0.13 knitr_1.40 magrittr_2.0.3 [6] covr_3.6.1 pkgnet_0.4.2.9999 R6_2.5.1 rlang_1.0.6 fastmap_1.1.0 [11] visNetwork_2.1.2 tools_4.1.0 DT_0.26 data.table_1.14.4 xfun_0.31 [16] cli_3.3.0 lambda.r_1.2.4 futile.logger_1.4.3 htmltools_0.5.3 lazyeval_0.2.2 [21] assertthat_0.2.1 digest_0.6.29 formatR_1.12 htmlwidgets_1.5.4 futile.options_1.0.1 [26] evaluate_0.15 glue_1.6.2 rmarkdown_2.17 compiler_4.1.0 jsonlite_1.8.0 [31] pkgconfig_2.0.3 ```
jameslamb commented 1 year ago

I think this item in https://cran.r-project.org/web/packages/igraph/news/news.html looks relevant.

igraph 1.3.0

closeness() now only considers reachable vertices during the calculation; in other words, closeness centrality is now calculated on a per-component basis for disconnected graphs. Earlier versions considered all vertices.

1.3.0 came out in April 2022 (link)

szhorvat commented 1 year ago

You will find a detailed explanation of how closeness calculation works in the documentation of the igraph C library:

https://igraph.org/c/html/latest/igraph-Structural.html#igraph_closeness

Earlier versions of igraph made a dubious choice for disconnected graphs: the distance to unreachable vertices was considered to be numerically equal to the number of vertices. This was a very problematic ad-hoc choice, especially when a distance cutoff was given, so it was changed. Now only reachable vertices are considered. If no vertices are reachable, the closeness is taken to be NaN.

As you can see from the docs, the C function returns some additional information that allows one to easily implement many different generalizations of closeness for disconnected graphs. This information is not yet exposed in R, but if you need it, you are welcome to open a feature request.

jameslamb commented 1 year ago

Thanks for the extra context @szhorvat , much appreciated!