Closed szhorvat closed 5 years ago
I think you are right on how the algorithm works, it creates stubs for all vertices, and then randomly tries to match them. Whenever a self-loop or multiedge would be created, we skip that edge. Then process is repeated with all the remaining stubs. When a point is reached, where the graph with the desired sequence cannot be created, the whole process starts anew.
So we either should find an argument, why this process does create uniformly distributed graphs or note in the documentation, that the process is not uniform. Then we might implement some algorithms that are indeed uniform.
Some notes:
random_configuration_model
. Since it does not implement the configuration model, this is very misleading. It would still be misleading if the documentation is updated to note that the sampling is not uniform.You are right, the process is not uniform. It is possible though, that the process described here has the same distribution. Unfortunately, the author did not provide any sources and I haven't found a description of this process somewhere else. Do you have an intuitive argument at hand, why the process here should not result in the same distribution?
it appears to me that when a double edge or a self-loop would be created, the algorithm tries to recover and generate a different edge. However, this would make the sampling non-uniform.
Can you give a proof of bias in edge selection? This assertion is not intuitive to me.
Regardless, from the referenced paper:
Instead, self-loops and multi-edges are allowed. But, these represent a tiny fraction of all edges, and we typically lose little by just discarding or collapsing them.
I think the confusion here comes from the name "configuration model" and exactly what people mean when they say "sample uniformly." This is not a configuration model (which produces a graph with an exact degree sequence), it is an Erdos-Renyi model w/ no collisions or self-loops. The degree distribution is not uniform, but the algorithm is sampling possible edges uniformly at random as an Erdos-Renyi generator should.
The algorithm should indeed be significantly more expensive to run for denser graphs compared to sparse ones due to the re-sampling, so if this is not the case there might be some bug in the sampling process. Benchmarking would show if this is the case or not.
Here is a bit of an experiment, that shows, that the two models do not behave differently. Although connectivity might not be the best example, as the two models might behave behave asymptotically similar.
I will have to have a look at the algorithm again, but if if truly samples from Erdos Renyi, conditioned on the degree sequence, then that might be even better. We could rename it then, and implement a slow configuration model using rejection sampling, until someone takes the time to implement something better.
using LightGraphs, Random, Base.Iterators, Statistics
function rand_reg_shouldbe(n, k)
stubs = collect(flatten(repeated(1:n, k)))
m = div(n * k, 2)
edges = Vector{Tuple{Int, Int}}(undef, m)
@inbounds while true
shuffle!(stubs)
no_loop = true
for i in 1:m
src = stubs[2*i - 1]
dst = stubs[2*i]
if src == dst
no_loop = false
break
end
edges[i] = minmax(src, dst)
end
no_loop || continue
sort!(edges)
allunique(edges) || continue
return SimpleGraphFromIterator(Edge.(edges))
end
end
function rand_reg_is(n, k)
return random_configuration_model(n, collect(repeated(k, n)))
end
# Probability that k-regular graph with n vertices is connected
# in the random configuration model.
p_shouldbe(n, k, nsamples) = mean(is_connected(rand_reg_shouldbe(n, k)) for _ in 1:nsamples)
# Probability that k-regular graph with n vertices is connected
# using our version of the random configuration model
p_is(n, k, nsamples) = mean(is_connected(rand_reg_is(n, k)) for _ in 1:nsamples)
julia> p_shouldbe(1000, 2, 100_000)
0.05971
julia> p_is(1000, 2, 100_000)
0.04722
I think we're getting a bit far afield here. I am reluctant to change this function given that we're worried about very small differences in probability (that might be eclipsed by the nonuniformity of the PRNG used) for large graphs.
One thing that I did note is that we should be asserting isgraphical
on this before we start; otherwise we can have a nonterminating condition:
a = [8, 8, 3, 3, 3, 3, 0, 0, 8]
g = random_configuration_model(9, a)
will not converge, because
julia> isgraphical(a)
false
I am not sure how to respond ... there seem to be too much confusion about concepts and misuse of terminology above to respond point by point. I would suggest familiarizing with the topic before proceeding. E.g., M.E.J. Newman: Networks, 2nd ed, has a good description of the configuration model.
I'll just note that the current situation severely diminishes my trust in this package.
I'll just note that the current situation severely diminishes my trust in this package.
Sorry to hear that. I hope you find a package that works better for your needs.
Note: New Julia user here. I am not at all comfortable with the language, just exploring Julia for use with graphs.
The configuration model appears to be implemented incorrectly. A giveaway is that something like
finishes fast. This should never finish in reasonable time with a correct implementation due to a very high rejection rate.
Looking at the implementation,
https://github.com/JuliaGraphs/LightGraphs.jl/blob/216f5ffa77860d4c39b8d05fe2197d0af75a4241/src/SimpleGraphs/generators/randgraphs.jl#L263
it appears to me that when a double edge or a self-loop would be created, the algorithm tries to recover and generate a different edge. However, this would make the sampling non-uniform.
To ensure uniform sampling, the algorithm should re-start from the beginning (throw away all edges that were created so far) whenever it would create a non-simple graph.