sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language
Other
670 stars 184 forks source link

Ambiguity in Barabási–Albert possibly worth investigating #1534

Open mlhetland opened 3 years ago

mlhetland commented 3 years ago

This is not an issue with LightGraphs in itself, but possibly worth looking at, nonetheless. A recent preprint by Stamatelatos and Efraimidis (About Weighted Random Sampling in Preferential Attachment Models) discusses an ambiguity that seems pervasive in descriptions of the Barabási–Albert model. For example, they say:

In particular, concerning the common definitions of the BA model, the expression “select node i with probability pᵢ” may be interpreted in at least 3 different ways:

  1. Each individual node i gets a link with an independent probability pᵢ.
  2. Each node i has a selection probability pᵢ in a scheme where the m nodes are selected one by one.
  3. The inclusion probability of i appearing in the collection of m nodes is pᵢ.

These interpretations are just 3 examples that are not equivalent to each other but all abide by the “probability proportional to degree” requirement imposed by the BA model.

[…]

[T]he majority of the BA algorithms do not follow the BA model strictly as imprinted in the analytical formulation of the model; most of the theoretical algorithms in the literature as well as the implementations in major open source frameworks consider the draw-by-draw scheme.

They argue that one cannot say that there is a single correct sampling scheme. Rather, in their conclusion, they say:

In this paper we have identified a subtle ambiguity in the definition of the Barabasi-Albert model by taking into consideration the field of unequal probability random sampling. We showed that the definition of the model, in particular the part where m nodes are selected based on their degrees, is open to interpretation and can lead to graphs with unexpected properties. For this reason, we proposed an extension to the definition in the form of a new parameter that dictates the exact weighted random sampling scheme.

They refer to previous publications that describe “[s]everal unequal probability sampling designs”, but they limit themselves to three (conditional Poisson, draw-by-draw and strπps).

It may not be a high priority to implement multiple sampling schemes, but to begin with it might be useful to at least document the choice being made (possibly even referring to Stamatelatos and Efraimidis for a discussion about the ambiguity in the definition), describing explicitly how the random sampling is done.

sbromberger commented 3 years ago

This sounds like a good approach (clarifying in the documentation which choice was made).