sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language
Other
671 stars 185 forks source link

explanation of outneighbors! #33

Closed sbromberger closed 9 years ago

sbromberger commented 9 years ago

@jpfairbanks -

function outneighbors!(outneighborhood, g, j)
    edgecollection = out_edges(g, j)
    degj = outdegree(g, j)
    for i =1:degj
        outneighborhood[i] = dst(edgecollection[i])
    end
    return sub(outneighborhood, 1:degj)
end

is there ever a case where outdegree(g,j) will NOT equal length(edgecollection)?

jpfairbanks commented 9 years ago

Well if you have weighted edges some people use degree to mean the sum of the weights in which case outdegree(v) != length(outneighbors(v)). Otherwise, if loops and multiedges are forbidden outdegree(v)==length(outneighbors(v)).

James Fairbanks PhD Student Georgia Tech

On Mon, Mar 30, 2015 at 4:43 PM, Seth Bromberger notifications@github.com wrote:

@jpfairbanks https://github.com/jpfairbanks -

function outneighbors!(outneighborhood, g, j) edgecollection = out_edges(g, j) degj = outdegree(g, j) for i =1:degj outneighborhood[i] = dst(edgecollection[i]) end return sub(outneighborhood, 1:degj) end

is there ever a case where outdegree(g,j) will NOT equal length(edgecollection)?

— Reply to this email directly or view it on GitHub https://github.com/JuliaGraphs/LightGraphs.jl/issues/33.

sbromberger commented 9 years ago

I've never heard of degree being subject to weights (I used http://en.wikipedia.org/wiki/Glossary_of_graph_theory as a source as well as my old graph theory textbooks, but I can't check those at the moment).

I'm not doubting you, but do you have a reference I can use? We will want to change things if this is indeed the case.

sbromberger commented 9 years ago

This turns out to be moot since I've found a really great improvement to adjacency_matrix that results in a two-order-of-magnitude speedup :) (That is, if we assume that adj_mx doesn't depend on weights.)

jpfairbanks commented 9 years ago

I am thinking of the definition of the Laplacian as D-A. If A is binary adjacency matrix then D is the degree matrix. If A_ij is the weight on edge ij then D is the sum of the weights. Since we don't make weighted graphs a first class object it is not necessary to support them like that. I think you can ignore that definition and have the external store for the weights support the sum of the weights definition themselves.

sbromberger commented 9 years ago

OK, cool. I'm going to be pushing a noincidence branch shortly that has a ton of changes, but please take a look specifically at the linalg.jl changes. Here's the speedup on my system on a (100k, 1m) graph:

julia> g = Graph(100000,1000000)
{100000, 1000000} undirected graph

julia> @time z = LightGraphs.adjmx(g);    # precompiled
elapsed time: 0.037574206 seconds (31 MB allocated)

julia> @time z = LightGraphs.adjmx(g);
elapsed time: 0.02864311 seconds (31 MB allocated)

vs

julia> @time z = adjacency_matrix(g);   # current code
elapsed time: 0.195658578 seconds (104 MB allocated, 36.42% gc time in 1 pauses with 0 full sweep)

julia> @time z = adjacency_matrix(g);
elapsed time: 0.162243224 seconds (104 MB allocated, 43.49% gc time in 4 pauses with 1 full sweep)

(so one order of magnitude, not two).

sbromberger commented 9 years ago

I found a reference to weights being used in degree calculations! http://networkx.github.io/documentation/latest/reference/generated/networkx.DiGraph.out_degree.html?highlight=out_degree#networkx.DiGraph.out_degree

I could probably add optional edge weights to the degree functions like we do with Dijkstra and others.

sbromberger commented 9 years ago

So - I've got edge weights in degrees, but it's about 150,000x slower than without (using sparse matrices). Any ideas on how to improve

indegree(
    g::AbstractGraph,
    v::Int,
    edge_dists::AbstractArray{Float64,2}
) = sum(filter(x->!isinf(x),edge_dists[:,v]))

?

jpfairbanks commented 9 years ago

Well one way is to just store it. If you aren't changing the weights then the degrees can only change when you insert and remove edges. In fact you probably want to store the regular unweighted degree.

the way you have it now requires copying all n entries of each column of v before filtering it.

You should be able to iterate over the distances that you expect to be nonzero using the edge lists. On Apr 2, 2015 5:05 PM, "Seth Bromberger" notifications@github.com wrote:

So - I've got edge weights in degrees, but it's about 150,000x slower than without (using sparse matrices). Any ideas on how to improve

indegree( g::AbstractGraph, v::Int, edge_dists::AbstractArray{Float64,2} ) = sum(filter(x->!isinf(x),edge_dists[:,v]))

?

— Reply to this email directly or view it on GitHub https://github.com/JuliaGraphs/LightGraphs.jl/issues/33#issuecomment-89044013 .

sbromberger commented 9 years ago

Storing edge weights goes against the LightGraph model though. I did that in an earlier version but took it out.

Let me look at better ways to optimize. Maybe subarrays / views.

jpfairbanks commented 9 years ago

Why would you store the weights in a dense 2D matrix if the graph is being stored as a sparse structure? You used edge_dists::AbstractArray meaning it is not necessarily dense, but then you access it edge_dists[:,i] are you intending edge_dists to have the same sparsity pattern as adjacency_matrix(g)? Someone might want to store the edge weights in a Dict{Edge,Float64}. Then you will want to access it with a for e in out_edges(g,v); dict[e]; end pattern. We might want to hold off on optimizing until someone starts using weighted graphs.

sbromberger commented 9 years ago

Why would you store the weights in a dense 2D matrix if the graph is being stored as a sparse structure?

You wouldn't except for small graphs where performance is important, since dense matrices are orders of magnitude faster than sparse matrices.

are you intending edge_dists to have the same sparsity pattern as adjacency_matrix(g)?

Not necessarily. I'm not assuming any particular structure here since it may in fact be something back-ended by a database or other datastore.

We might want to hold off on optimizing until someone starts using weighted graphs.

We have our use case: PageRank requires stochastically-balanced graphs :)

jpfairbanks commented 9 years ago

Yes but if we implement weighted graphs for pagerank we are back to doing Sparse Matrix vector multiplication with the graph and we decided last week that dumping to CSC and using A*x is faster than using the graph directly. I would be happy to be proven wrong about this!

Plus if we dump to SparseMatrixCSC we can use either direct solvers for small graphs or iterative solvers for large graphs which is faster than using the power iteration method commonly used.

The LinearOperators.jl has a framework for doing linear algebra with AbstractMatrix type things.

Something like

function LinearOperator(g::Graph) 
   return Linearoperator(n,n, Float64, isdirected(g), isdirected(g),
                     x -> spmv(g,weights, x), Nothing(), y -> spmvt(g, weights, y))
end

where spmv is sparse matrix dense vector multiplication with edge weights. And spmvt is using the backward edges.

sbromberger commented 9 years ago

That's assuming we use the algebraic method to compute PageRank (which I'm certainly not dismissing), but the NetworkX implementation takes a different approach that might (or might not) be faster.

jpfairbanks commented 9 years ago

This python code from networkx.algorithms.pagerank does sparse matrix vector multiply for power iteration.

# power iteration: make up to max_iter iterations
for _ in range(max_iter):
        xlast = x
        x = dict.fromkeys(xlast.keys(), 0)
        danglesum = alpha * sum(xlast[n] for n in dangling_nodes)
        for n in x:
            # this matrix multiply looks odd because it is
            # doing a left multiply x^T=xlast^T*W
            for nbr in W[n]:
                x[nbr] += alpha * xlast[n] * W[n][nbr][weight]
            x[n] += danglesum * dangling_weights[n] + (1.0 - alpha) * p[n]
        # check convergence, l1 norm
        err = sum([abs(x[n] - xlast[n]) for n in x])
        if err < N*tol:
            return x

networkx.algorithms.pagerank_scipy dumps the graph as a sparse matrix and then uses scipy.sparse to do power iteration.

# power iteration: make up to max_iter iterations
    for _ in range(max_iter):
        xlast = x
        x = alpha * (x * M + sum(x[is_dangling]) * dangling_weights) + \
            (1 - alpha) * p
        # check convergence, l1 norm
        err = scipy.absolute(x - xlast).sum()
        if err < N * tol:
            return dict(zip(nodelist, map(float, x)))

The networkx package takes the same approach.

sbromberger commented 9 years ago

Hrm. OK. We're way outside my area of expertise here. I'll defer to you but will continue to explore good implementations of PageRank in the meantime.

sbromberger commented 9 years ago

Implemented pagerank based on nx.pagerank_scipy via #50.

jpfairbanks commented 9 years ago

Just took another look at this and the implementation in #50 looks right to me although you can use scale!(S, M)' instead of Q=spdiamg(S); M = Q*M'. Probably not necessary unless people complain about the performance of the pagerank.

sbromberger commented 9 years ago

If scale is faster, let's do it! Mind submitting a pr?

jpfairbanks commented 9 years ago

Oops I just pushed to origin/master instead of my fork. It passed the unit tests before I pushed. So hopefully that didn't break anything. Should I do something to fix it? It was commit b61d70b44e49a0b366f1371b9584b86e909ecdcd btw

sbromberger commented 9 years ago

Nah, as long as tests pass and it's faster we should be ok :) thanks.

sbromberger commented 9 years ago

You cut the time by more than half!! Thanks. This is great.