igraph / rigraph

igraph R package
https://r.igraph.org
543 stars 201 forks source link

Documentation unclear on normalization of betweenness when a cutoff is set #1489

Open DOSull opened 2 weeks ago

DOSull commented 2 weeks ago

What happens, and what did you expect instead?

The betweenness documentation states under the normalized argument that

If TRUE, then the results are normalized by the number of ordered or unordered vertex pairs in directed and undirected graphs, respectively. In an undirected graph,

B^n=\frac{2B}{(n-1)(n-2)}

where $B^n$ is the normalized, $B$ the raw betweenness, and $n$ is the number of vertices in the graph.

I assume this is a true statement of what happens, but I am unsure that it is correct behaviour.

First the number of paths in a directed graph is twice that in an undirected graph so that the factor of 2 should probably be dropped. Second, and the issue that got me thinking about this is that it is entirely unclear what the normalization factor should be if there is a cutoff in the betweenness calculation, since the number of paths considered in that case is not trivially predictable from the size of the graph.

I originally (wrongly) posted in this issue on the igraph c repo, where @szhorvat suggests.

In any case, this is not a bug, more a request for more explicit information in the documentation.

To reproduce

NA

System information

NA

szhorvat commented 1 week ago

Thanks for bringing this up! It's an important point.

I improved the documentation a little bit, and clarified that the same normalization factor is used, even with a cutoff. An alternative would have been to disallow normalization with a cutoff completely. This would be a breaking change, so it's more trouble to deal with. There's also the argument to be made that for some applications (e.g. https://doi.org/10.1103/PhysRevLett.105.038701) one would want to keep the cutoff version "comparable" (in the sense of being scaled the same way as) the non-cutoff version.

First the number of paths in a directed graph is twice that in an undirected graph so that the factor of 2 should probably be dropped.

The text does say that this formula is for undirected graphs, and does explain where it comes from (number of unordered pairs). I found it difficult to squeeze both directed and undirected into the same text, but improvements are very welcome (just open a PR).

I'll leave the decision on whether to make further changes here, or close the issue, to @krlmlr

DOSull commented 1 week ago

The documentation could perhaps add something like

For directed graphs the normalization factor is changed to $2B^n$.

szhorvat commented 1 week ago

The normalization factor is $\frac{(n-1)(n-2)}{2}$ (the number of unordered pairs) for undirected graphs and $(n-1)(n-2)$ (the number of ordered pairs) for directed ones.

$B^n$ refers to the normalized betweenness, not the normalization factor, in the current docs.

If you'd like to make improvements, they are really very welcome. We realize that the docs do need a lot of fixes, but we are short on hands. If you are interested in rephrasing / improving some documentation, that would be much appreciated. I'm happy to help you get started with opening PRs, or show you how to transform some text you give me to a ready-to-merge PR. 🙂

DOSull commented 1 week ago

My mistake on the suggested edit - don't know what I was thinking!

Unfortunately, I am a frequent user of many different FOSS packages but by no means an expert in many (if any) of them. As such, I think my best contribution is reporting any issues I run into not contributing directly - if I start contributing to one, then I'll feel obliged to contribute to many, and that just isn't a realistic proposition for me right now, unfortunately :-(