Support Arbitrary graph data structures

jpfairbanks commented 9 years ago

Currently this only uses CSC matrices and has only been tested on that. But you could have the adjacency matrix stored in a LightGraphs.jl or Graphs.jl Graph data structure.

sbromberger commented 9 years ago

LightGraphs exports CSC matrices using adjacency_matrix() - will that help?

jpfairbanks commented 9 years ago

That means that we can write a bridge easily. If you have a LightGraph and want to make a CombinatorialAdjacency you can just use

using LightGraphs
using GraphMatrices
n = 10
g = Graph(n)
add_edge!(g, 1, 2)
A = CombinatorialAdjacency(adjacency_matrix(g))
L = CombinatorialLaplacian(A)

But this method copies all of the data in g into a CSC matrix. This is worth it if you are going to make a static graph and then do a lot of operations on the GraphMatrices of g. If you are going to repeatedly modify the graph and then update the GraphMatrices, then you are going to pay for a lot of copying.

If we back the GraphMatrix with some structure that provides access to the edges, degrees, a matvec function and the following array functions:

eltype(A) the type of the elements contained in A
length(A) the number of elements in A
ndims(A) the number of dimensions of A
size(A) a tuple containing the dimensions of A
size(A,n) the size of A in a particular dimension
stride(A,k) the stride (linear index distance between adjacent elements) along dimension k
strides(A) a tuple of the strides in each dimension

then we can have a data type for GraphMatrices which will update when the graph changes without copying. I think we just need to write some functions on LightGraph.Graph in order to make it work here. Then abstract out the storage of the adjacency matrix in the CombinatorialAdjacency type.

sbromberger commented 9 years ago

For lightgraphs, it's easy. eltype(A) is whatever you want (you can get adjacency matrices in any type; it defaults to int). length(A) is nv(g), ndims(A) is 2, size(A) is (nv(g), nv(g)), size(A,n) == nv(g) for all n. Not sure what strides are :)

jpfairbanks commented 9 years ago

For 2d arrays, they are the spacing in memory between the columns. They aren't necessary for sparse Matrices to implement. Operations that use the memory layout of a matrix directly won't work but that is OK.

Can you write a matvec routine for lightgraphs? It should take an arbitrary type AbstractVector{T} and use the right element type on each edge so that the output vector is the same type as the input vector. There should also be a form *!(g, y, x) where y is the preallocated output.

The naive way is for each vertex for each neighbor y[i] += x[i]. But you should pick the way that is fastest on the lightgraphs data structure. Having edge weights will be important in the future.

On Thu, Mar 19, 2015, 10:27 AM Seth Bromberger notifications@github.com wrote:

For lightgraphs, it's easy. eltype(A) is whatever you want (you can get adjacency matrices in any type; it defaults to int). length(A) is nv(g), ndims(A) is 2, size(A) is (nv(g), nv(g)), size(A,n) == nv(g) for all n. Not sure what strides are :)

— Reply to this email directly or view it on GitHub https://github.com/jpfairbanks/GraphMatrices.jl/issues/1#issuecomment-83606088 .

sbromberger commented 9 years ago

Help me understand the matvec concept a bit better, please. A vector of what, exactly?

jpfairbanks commented 9 years ago

x is a vector of any type representing one value per vertex. The operation y = A*x where A is the adjacency matrix and x is a vector whose elements are compatible with the edge weights is the action of the graph on vectors. You can think of a vector as a function from vertices to values in some type T.

For example if T is Int and g is a graph

A = CombinatorialAdjacency(g)
x=ones(Int, nv(g))
y=A*x
for i=1:nv(g)
    @test y[i] == degree(g, i)
end

Spectral Graph Theory is the field of thinking about graphs in this way. The graph is a linear operator on the vector space T^nv(g). Then things like var = x'_L_x is the variance of the function x over the graph. In math T is usually one of Integer, Real or Complex. The graph can have edge weights in Integer, Real, or Complex too.

Having access to the matvec function is sufficient to do most linear algebra by applying iterative methods.

sbromberger commented 9 years ago

The easiest (and probably fastest) way to do this, if I understand it correctly, is just to regenerate the adjacency matrices on demand. They're very fast, even with graphs on the order of (1e6,1e7):

julia> @time a = adjacency_matrix(g);
elapsed time: 2.46080318 seconds (515 MB allocated, 39.28% gc time in 11 pauses with 1 full sweep)

julia> size(a)
(1000000,1000000)

jpfairbanks commented 9 years ago

Why is 40% of the time spent in gc? The benefit of using the graph directly to do the matvec instead of converting to SparseCSCMatrix is only going to appear when making small changes to the graph and then using matrix again. That assumes that edge insertion to a graph is faster than into a CSC or going through the CSC -> COO, append new edges, COO -> CSC cycle.

It might be the case with fast sparse matrix routines that reconstructing the adjacency matrix as a CSC is the fastest way. If so, we should just make the user reconstruct the CombinatorialAdjacency object every time the graph changes, because the usage I foresee is change graph -> do 1-100 matvecs -> change graph again. So once you modify the graph, many matvecs will be performed before modifying the graph again.

for j in 1:n_v
        dsts = out_neighbors(g, j) #<-memory allocation
        colpt[j+1] = colpt[j] + length(dsts)
        append!(rowval,sort!(dsts))
    end

These all allocate a new array for their output. Could it be done with subarrays?

neighbors(g::AbstractGraph, v::Int) = [e.dst for e in g.finclist[v]]
in_neighbors(g::AbstractGraph, v::Int) = [e.src for e in g.binclist[v]]
out_neighbors(g::AbstractGraph, v::Int) = [e.dst for e in g.finclist[v]]

If you don't want to complicate the API then at least the memory could be reused in the construction of the CSC by using out_edges and manually reusing the memory.

sbromberger commented 9 years ago

I wonder if we can simply translate finclist/binclist into an adjacency matrix very quickly. I'll try this.

sbromberger commented 9 years ago

I'd like to revisit this if you're still maintaining the package. I think having this functionality available to LightGraphs would be very cool.

jpfairbanks commented 9 years ago

Yes I am maintaining this. I haven't been using it this summer but will be soon. What do you need? On Jul 28, 2015 6:01 PM, "Seth Bromberger" notifications@github.com wrote:

I'd like to revisit this if you're still maintaining the package. I think having this functionality available to LightGraphs would be very cool.

— Reply to this email directly or view it on GitHub https://github.com/jpfairbanks/GraphMatrices.jl/issues/1#issuecomment-125766183 .

sbromberger commented 9 years ago

I'd like to create / close out https://github.com/JuliaGraphs/LightGraphs.jl/issues/23 :)

jpfairbanks commented 9 years ago

I think we just need to make sure that g_v is defined for graph_vector in an efficient way and then change some type constraints in this package and it all will work. Of course adding tests.

jpfairbanks commented 9 years ago

I forgot to quote the '*' operator

sbromberger commented 9 years ago

What should g*v produce?

jpfairbanks commented 9 years ago

A vector y such that y[i]=sum(v[out_neighbors(g, i)]) for all i .

jpfairbanks commented 9 years ago

One way to represent it is

for (i,j) in edges(g)
    y[i] += v[j]
end

sbromberger commented 9 years ago

What's the correct behavior for undirected graphs? Should we count both directions, or just one? This function counts both directions (and therefore the sum(bar(g,ones(Int,nv(g)))) equals 2*ne(g)):

function bar{T<:Real}(g::SimpleGraph, v::Vector{T})
  y = zeros(T, nv(g))
  for i in vertices(g)
    y[i] = sum(v[out_neighbors(g, i)])
  end
  return y
end

but the sum of the vector returned by your function above yields ne(g).

Your function is obviously faster, so if it's correct, awesome.

sbromberger commented 9 years ago

(ping?)

jpfairbanks commented 9 years ago

yeah it should count both directions like you have. did you test them to see which is faster? I haven't tried either.

sbromberger commented 9 years ago

Yours is faster. We can modify it: I think

for (i,j) in edges(g)
    y[i] += v[j]
    y[j] += v[i]
end

for Graphs would work, and we can specialize on DiGraphs without the second accumulator.

Confirmed that this:

function foo{T<:Real}(g::Graph, v::Vector{T})
        y = zeros(T, nv(g))
       for (i,j) in edges(g)
           y[i] += v[j]
           y[j] += v[i]
       end
       return y
       end

works and produces the same results as bar(g,v) above, while

function foo{T<:Real}(g::DiGraph, v::Vector{T})
        y = zeros(T, nv(g))
       for (i,j) in edges(g)
           y[i] += v[j]
       end
       return y
       end

does the same for directed graphs.

sbromberger commented 9 years ago

So, what do I call this function, and where do we put it - in GraphMatrices, or in LightGraphs?

jpfairbanks commented 9 years ago

You call it * and there are two options

Put it in LightGraphs. Then when someone wants to use both packages they do

using LightGraphs
using GraphMatrices
g = graph(nv)
# initialize graph
A = CombinatorialAdjacency(g)
L = CombinatorialLaplacian(A)
evals = eigs(L)

You can use L like a matrix in any iterative method.

Put it in GraphMatrices. Then we have a dependency between GraphMatrices and LightGraphs, so that you can't use GraphMatrices without using LightGraphs, unless we do some conditional using logic or something.

I think it should go in LightGraphs. GraphMatrices should be completely independent of the storage format underlying the graph.

The code above should work (you will need to actually do the initialization). If it doesn't then it is probably because the other functions necessary (size, eltype,...) aren't defined for the graph type. I am back on campus next week so I will be back to working with graphs.

sbromberger commented 9 years ago

OK, I've created a matvec branch in LightGraphs with the necessary code. Unfortunately, I have no way to test that it actually produces correct results with GraphMatrices. Could you test (and write some tests)? Thanks.

sbromberger commented 9 years ago

Also, I error out if the length of the vector does not equal the number of vertices. Should we error if length < nv and merely warn if it's > ?

Also also: should the * be commutative? (That is, should I define *(v::Vector, g::SimpleGraph) as well?)

sbromberger commented 9 years ago

julia> g = PathGraph(10)
{10, 9} undirected graph

julia> A = CombinatorialAdjacency(g)
ERROR: MethodError: `convert` has no method matching convert(::Type{GraphMatrices.CombinatorialAdjacency{T}}, ::LightGraphs.Graph)
This may have arisen from a call to the constructor GraphMatrices.CombinatorialAdjacency{T}(...),
since type constructors fall back to convert methods.

julia> A = CombinatorialAdjacency(adjacency_matrix(g))
GraphMatrices.CombinatorialAdjacency{Int64}(10x10 sparse matrix with 18 Int64 entries:
    [2 ,  1]  =  1
    [1 ,  2]  =  1
    [3 ,  2]  =  1
    [2 ,  3]  =  1
    [4 ,  3]  =  1
    [3 ,  4]  =  1
    [5 ,  4]  =  1
    [4 ,  5]  =  1
    [6 ,  5]  =  1
    [5 ,  6]  =  1
    [7 ,  6]  =  1
    [6 ,  7]  =  1
    [8 ,  7]  =  1
    [7 ,  8]  =  1
    [9 ,  8]  =  1
    [8 ,  9]  =  1
    [10,  9]  =  1
    [9 , 10]  =  1,[1,2,2,2,2,2,2,2,2,1])

jpfairbanks commented 9 years ago

I am looking into it. I'll make a PR I think we have a decision to make.

jpfairbanks commented 9 years ago

Can we close this until somebody else writes a graphs library and wants to integrate?

sbromberger commented 9 years ago

Fine by me. It's working :)

jpfairbanks commented 8 years ago

This works on LightGraphs, open a new issue if you want integration with a different graph library.

jpfairbanks / GraphMatrices.jl

Support Arbitrary graph data structures #1