Closed sbromberger closed 9 years ago
Well if you have weighted edges some people use degree to mean the sum of
the weights in which case outdegree(v) != length(outneighbors(v))
.
Otherwise, if loops and multiedges are forbidden
outdegree(v)==length(outneighbors(v))
.
James Fairbanks PhD Student Georgia Tech
On Mon, Mar 30, 2015 at 4:43 PM, Seth Bromberger notifications@github.com wrote:
@jpfairbanks https://github.com/jpfairbanks -
function outneighbors!(outneighborhood, g, j) edgecollection = out_edges(g, j) degj = outdegree(g, j) for i =1:degj outneighborhood[i] = dst(edgecollection[i]) end return sub(outneighborhood, 1:degj) end
is there ever a case where outdegree(g,j) will NOT equal length(edgecollection)?
— Reply to this email directly or view it on GitHub https://github.com/JuliaGraphs/LightGraphs.jl/issues/33.
I've never heard of degree being subject to weights (I used http://en.wikipedia.org/wiki/Glossary_of_graph_theory as a source as well as my old graph theory textbooks, but I can't check those at the moment).
I'm not doubting you, but do you have a reference I can use? We will want to change things if this is indeed the case.
This turns out to be moot since I've found a really great improvement to adjacency_matrix that results in a two-order-of-magnitude speedup :) (That is, if we assume that adj_mx doesn't depend on weights.)
I am thinking of the definition of the Laplacian as D-A
. If A
is binary
adjacency matrix then D
is the degree matrix. If A_ij
is the weight on edge
ij
then D
is the sum of the weights. Since we don't make weighted
graphs a first class object it is not necessary to support them like that.
I think you can ignore that definition and have the external store for the
weights support the sum of the weights definition themselves.
OK, cool. I'm going to be pushing a noincidence
branch shortly that has a ton of changes, but please take a look specifically at the linalg.jl
changes. Here's the speedup on my system on a (100k, 1m) graph:
julia> g = Graph(100000,1000000)
{100000, 1000000} undirected graph
julia> @time z = LightGraphs.adjmx(g); # precompiled
elapsed time: 0.037574206 seconds (31 MB allocated)
julia> @time z = LightGraphs.adjmx(g);
elapsed time: 0.02864311 seconds (31 MB allocated)
vs
julia> @time z = adjacency_matrix(g); # current code
elapsed time: 0.195658578 seconds (104 MB allocated, 36.42% gc time in 1 pauses with 0 full sweep)
julia> @time z = adjacency_matrix(g);
elapsed time: 0.162243224 seconds (104 MB allocated, 43.49% gc time in 4 pauses with 1 full sweep)
(so one order of magnitude, not two).
I found a reference to weights being used in degree calculations! http://networkx.github.io/documentation/latest/reference/generated/networkx.DiGraph.out_degree.html?highlight=out_degree#networkx.DiGraph.out_degree
I could probably add optional edge weights to the degree functions like we do with Dijkstra and others.
So - I've got edge weights in degrees, but it's about 150,000x slower than without (using sparse matrices). Any ideas on how to improve
indegree(
g::AbstractGraph,
v::Int,
edge_dists::AbstractArray{Float64,2}
) = sum(filter(x->!isinf(x),edge_dists[:,v]))
?
Well one way is to just store it. If you aren't changing the weights then the degrees can only change when you insert and remove edges. In fact you probably want to store the regular unweighted degree.
the way you have it now requires copying all n entries of each column of v before filtering it.
You should be able to iterate over the distances that you expect to be nonzero using the edge lists. On Apr 2, 2015 5:05 PM, "Seth Bromberger" notifications@github.com wrote:
So - I've got edge weights in degrees, but it's about 150,000x slower than without (using sparse matrices). Any ideas on how to improve
indegree( g::AbstractGraph, v::Int, edge_dists::AbstractArray{Float64,2} ) = sum(filter(x->!isinf(x),edge_dists[:,v]))
?
— Reply to this email directly or view it on GitHub https://github.com/JuliaGraphs/LightGraphs.jl/issues/33#issuecomment-89044013 .
Storing edge weights goes against the LightGraph model though. I did that in an earlier version but took it out.
Let me look at better ways to optimize. Maybe subarrays / views.
Why would you store the weights in a dense 2D matrix if the graph is being stored as a sparse structure?
You used edge_dists::AbstractArray
meaning it is not necessarily dense, but then you access it edge_dists[:,i]
are you intending edge_dists
to have the same sparsity pattern as adjacency_matrix(g)
? Someone might want to store the edge weights in a Dict{Edge,Float64}
. Then you will want to access it with a for e in out_edges(g,v); dict[e]; end
pattern. We might want to hold off on optimizing until someone starts using weighted graphs.
Why would you store the weights in a dense 2D matrix if the graph is being stored as a sparse structure?
You wouldn't except for small graphs where performance is important, since dense matrices are orders of magnitude faster than sparse matrices.
are you intending edge_dists to have the same sparsity pattern as adjacency_matrix(g)?
Not necessarily. I'm not assuming any particular structure here since it may in fact be something back-ended by a database or other datastore.
We might want to hold off on optimizing until someone starts using weighted graphs.
We have our use case: PageRank requires stochastically-balanced graphs :)
Yes but if we implement weighted graphs for pagerank we are back to doing Sparse Matrix vector multiplication with the graph and we decided last week that dumping to CSC and using A*x is faster than using the graph directly. I would be happy to be proven wrong about this!
Plus if we dump to SparseMatrixCSC
we can use either direct solvers for small graphs or iterative solvers for large graphs which is faster than using the power iteration method commonly used.
The LinearOperators.jl has a framework for doing linear algebra with AbstractMatrix type things.
Something like
function LinearOperator(g::Graph)
return Linearoperator(n,n, Float64, isdirected(g), isdirected(g),
x -> spmv(g,weights, x), Nothing(), y -> spmvt(g, weights, y))
end
where spmv is sparse matrix dense vector multiplication with edge weights. And spmvt is using the backward edges.
That's assuming we use the algebraic method to compute PageRank (which I'm certainly not dismissing), but the NetworkX implementation takes a different approach that might (or might not) be faster.
This python code from networkx.algorithms.pagerank
does sparse matrix vector multiply for power iteration.
# power iteration: make up to max_iter iterations
for _ in range(max_iter):
xlast = x
x = dict.fromkeys(xlast.keys(), 0)
danglesum = alpha * sum(xlast[n] for n in dangling_nodes)
for n in x:
# this matrix multiply looks odd because it is
# doing a left multiply x^T=xlast^T*W
for nbr in W[n]:
x[nbr] += alpha * xlast[n] * W[n][nbr][weight]
x[n] += danglesum * dangling_weights[n] + (1.0 - alpha) * p[n]
# check convergence, l1 norm
err = sum([abs(x[n] - xlast[n]) for n in x])
if err < N*tol:
return x
networkx.algorithms.pagerank_scipy
dumps the graph as a sparse matrix and then uses scipy.sparse
to do power iteration.
# power iteration: make up to max_iter iterations
for _ in range(max_iter):
xlast = x
x = alpha * (x * M + sum(x[is_dangling]) * dangling_weights) + \
(1 - alpha) * p
# check convergence, l1 norm
err = scipy.absolute(x - xlast).sum()
if err < N * tol:
return dict(zip(nodelist, map(float, x)))
The networkx package takes the same approach.
Hrm. OK. We're way outside my area of expertise here. I'll defer to you but will continue to explore good implementations of PageRank in the meantime.
Implemented pagerank based on nx.pagerank_scipy
via #50.
Just took another look at this and the implementation in #50 looks right to me although you can use scale!(S, M)'
instead of Q=spdiamg(S); M = Q*M'
. Probably not necessary unless people complain about the performance of the pagerank.
If scale is faster, let's do it! Mind submitting a pr?
Oops I just pushed to origin/master instead of my fork. It passed the unit tests before I pushed. So hopefully that didn't break anything. Should I do something to fix it? It was commit b61d70b44e49a0b366f1371b9584b86e909ecdcd btw
Nah, as long as tests pass and it's faster we should be ok :) thanks.
You cut the time by more than half!! Thanks. This is great.
@jpfairbanks -
is there ever a case where
outdegree(g,j)
will NOT equallength(edgecollection)
?