Closed pitsianis closed 3 years ago
nv(g)
has to be less than or equal to n
because we intend to grow the graph to n
vertices (from nv(g)
vertices).
k
has to be less than nv(g)
since we're adding k
distinct edges from each new node*.
*the question is whether a new node that has been previously added can be the destination of a new edge. The idea of using an existing graph and extending it via B-A is novel to me, but the other constructors are traditional.
cc @devmotion - this is 4-year-old code, but do you recall any details?
I guess I have misunderstood the meaning of n. The resulting graph has n
nodes and not nv(g) + n
.
You do raise an interesting point. Is the resulting graph bipartite? In other words do the nodes with ids nv(g) + 1 : n
only connect to the initial g
vertices?
Perhaps we can drop my original comment and rename the issue with only this more interesting question?
I was just about to comment that there is not necessarily a doubling of nodes :slightly_smiling_face:
I am pretty sure that the algorithm does not generate a bipartite graph, but just performs preferential attachment according to the Barabasi-Albert model until n
nodes are present. I'll recheck the code.
offset
(https://github.com/JuliaGraphs/LightGraphs.jl/blob/aa7fc2cc94fe269c866cf17b3a950bd3d760c55b/src/SimpleGraphs/Generators/randgraphs.jl#L281) and its update in https://github.com/JuliaGraphs/LightGraphs.jl/blob/aa7fc2cc94fe269c866cf17b3a950bd3d760c55b/src/SimpleGraphs/Generators/randgraphs.jl#L309-L310 ensures that we not only sample from the initial vertices in https://github.com/JuliaGraphs/LightGraphs.jl/blob/aa7fc2cc94fe269c866cf17b3a950bd3d760c55b/src/SimpleGraphs/Generators/randgraphs.jl#L298 when performing preferential attachment.
Looking at the code again, I'm wondering if the implementation could be made more efficient by not keeping replicates of each vertex in weightedVs
but rather have a vector of size n
of unnormalized weights and sample according to these.
This is how I would do it
function r = preferentialAttachment(d)
%
% Given n nodes with degrees d(1:n)
% select randomly the node r to connect to
% with probability d(r)/sum(d)
s = cumsum([0; d]);
v = randperm(s(end), 1);
r = sum(v > s);
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
go away bot
this still seems like an issue
Maybe I don't remember correctly but I thought it was explained why the check mentioned in the OP is part of the implementation?
Also the sampling is correct (as far as it can be since it was not clearly defined in the original publication), using basically the same algorithm as suggested above. It would be reasonable to use the existing algorithms in StatsBase for weighted sampling without replacement here but in my simple benchmarks a while ago they were always outperformed by the current implementation. I assume this is at least partly due to the inefficiency of PriorityQueue
(see also https://github.com/JuliaCollections/DataStructures.jl/issues/690) and I guess for other configurations of n
and k
they might be more performant.
Since the original question has been answered and we're debating possible performance enhancements, would it be ok to close this out? If there's an implementation that is faster/better, we can always open a new PR/issue to discuss.
Why require
nv(g) <= n
andin
After all, the function internally is adding one node at a time, why impose at least a doubling of the nodes of the initial graph?
Any reference to literature where this assertion is based upon?