Why require the added nodes to be at least as many as in the initial graph in `barabasi_albert!` ?

sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language

Other

672 stars 185 forks source link

Why require the added nodes to be at least as many as in the initial graph in `barabasi_albert!` ? #1476

Closed pitsianis closed 3 years ago

pitsianis commented 4 years ago

Why require nv(g) <= n and

throw(ArgumentError("Barabási-Albert model requires 1 <= k <= nv(g) <= n"))

barabasi_albert!(G::AbstractGraph, n::Integer, k::Integer)

After all, the function internally is adding one node at a time, why impose at least a doubling of the nodes of the initial graph?

Any reference to literature where this assertion is based upon?

sbromberger commented 4 years ago

nv(g) has to be less than or equal to n because we intend to grow the graph to n vertices (from nv(g) vertices). k has to be less than nv(g) since we're adding k distinct edges from each new node*.

*the question is whether a new node that has been previously added can be the destination of a new edge. The idea of using an existing graph and extending it via B-A is novel to me, but the other constructors are traditional.

cc @devmotion - this is 4-year-old code, but do you recall any details?

pitsianis commented 4 years ago

I guess I have misunderstood the meaning of n. The resulting graph has n nodes and not nv(g) + n.

You do raise an interesting point. Is the resulting graph bipartite? In other words do the nodes with ids nv(g) + 1 : n only connect to the initial g vertices?

Perhaps we can drop my original comment and rename the issue with only this more interesting question?

devmotion commented 4 years ago

I was just about to comment that there is not necessarily a doubling of nodes :slightly_smiling_face:

devmotion commented 4 years ago

I am pretty sure that the algorithm does not generate a bipartite graph, but just performs preferential attachment according to the Barabasi-Albert model until n nodes are present. I'll recheck the code.

devmotion commented 4 years ago

offset (https://github.com/JuliaGraphs/LightGraphs.jl/blob/aa7fc2cc94fe269c866cf17b3a950bd3d760c55b/src/SimpleGraphs/Generators/randgraphs.jl#L281) and its update in https://github.com/JuliaGraphs/LightGraphs.jl/blob/aa7fc2cc94fe269c866cf17b3a950bd3d760c55b/src/SimpleGraphs/Generators/randgraphs.jl#L309-L310 ensures that we not only sample from the initial vertices in https://github.com/JuliaGraphs/LightGraphs.jl/blob/aa7fc2cc94fe269c866cf17b3a950bd3d760c55b/src/SimpleGraphs/Generators/randgraphs.jl#L298 when performing preferential attachment.

Looking at the code again, I'm wondering if the implementation could be made more efficient by not keeping replicates of each vertex in weightedVs but rather have a vector of size n of unnormalized weights and sample according to these.

pitsianis commented 4 years ago

This is how I would do it

function r = preferentialAttachment(d)
% 
% Given n nodes with degrees d(1:n)
% select randomly the node r to connect to
% with probability d(r)/sum(d)

s = cumsum([0; d]);
v = randperm(s(end), 1);
r = sum(v > s);

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

matbesancon commented 3 years ago

go away bot

matbesancon commented 3 years ago

this still seems like an issue

devmotion commented 3 years ago

Maybe I don't remember correctly but I thought it was explained why the check mentioned in the OP is part of the implementation?

devmotion commented 3 years ago

Also the sampling is correct (as far as it can be since it was not clearly defined in the original publication), using basically the same algorithm as suggested above. It would be reasonable to use the existing algorithms in StatsBase for weighted sampling without replacement here but in my simple benchmarks a while ago they were always outperformed by the current implementation. I assume this is at least partly due to the inefficiency of PriorityQueue (see also https://github.com/JuliaCollections/DataStructures.jl/issues/690) and I guess for other configurations of n and k they might be more performant.

sbromberger commented 3 years ago

Since the original question has been answered and we're debating possible performance enhancements, would it be ok to close this out? If there's an implementation that is faster/better, we can always open a new PR/issue to discuss.