sbromberger / LightGraphs.jl

An optimized graphs package for the Julia programming language
Other
671 stars 185 forks source link

Possible issue with adjacency matrices #151

Closed sbromberger closed 9 years ago

sbromberger commented 9 years ago

cc @jpfairbanks

I noticed something unusual today:

julia> using LightGraphs

julia> g = PathDiGraph(3)
{3, 2} directed graph

julia> edges(g)
Set([edge 1 - 2,edge 2 - 3])

julia> adjacency_matrix(g)
3x3 sparse matrix with 2 Int64 entries:
    [2, 1]  =  1
    [3, 2]  =  1

julia> adjacency_matrix(g,:out)
3x3 sparse matrix with 2 Int64 entries:
    [2, 1]  =  1
    [3, 2]  =  1

julia> adjacency_matrix(g,:in)
3x3 sparse matrix with 2 Int64 entries:
    [1, 2]  =  1
    [2, 3]  =  1

julia> adjacency_matrix(g)[1,2]
0

This looks backwards. The adjacency matrix A[i,j] should be 1 if there's an edge from i to j.

See https://github.com/JuliaGraphs/LightGraphs.jl/blob/master/src/spectral.jl#L9-L35 for the relevant code.

The problem is, when I change it to what I think is correct code, katz_centrality() breaks.

jpfairbanks commented 9 years ago

It looks like you are building the CSC "by hand" on lines 29 to 34 of adjacency_matrix. If SimpleGraph is being stored as a SparseMatrixCSC, why not just return that matrix to the user?

This function in https://github.com/JuliaLang/julia/blob/master/base/sparse/csparse.jl#L15 builds CSC matrices from coordinate triples. The functions are LGPL so if you don't want to LGPL LightGraphs, don't copy/look at it. Just use it.

sparse{Tv,Ti<:Integer}(I::AbstractVector{Ti},
                                J::AbstractVector{Ti},
                                V::AbstractVector{Tv},
                                nrow::Integer, ncol::Integer,
                                combine::Union{Function,Base.Func})

Since LightGraphs stores the set of edges, you can build the triples from that Set with one pass

I = []
J = []
for (u,v) in g.edges
    push!(I,u)
    push!(J,v)
end
A = sparse(I,J,1)

Because we are storing things in CSC we want A[:,i] to be the out_neighbors(g,i) and that also matches the Linalg definition so that A*e_i = sum(e_j for j in out_neighbors(g,i)). Based on the formula for Katz centrality in the paper by @weijianzhang (I-A)x = ones(n) he is using the definition that A[i,j] = 1 iff j -> i

If you are using CSR then everything is backwards and you define Katz Centrality as the x which solves x(I-A) = 1

sbromberger commented 9 years ago

I was rolling the adjacency matrix by hand. I'm not storing SimpleGraph as a SparseMatrixCSC. I am storing SimpleSparseGraph as one, though (this is the current test).

The issue comes in with SimpleGraph not providing the same results as SimpleSparseGraph, and that's what's concerning me.

ETA: ok, I get that things should be backwards with CSC. That means that "forward adj matrix" needs to be renamed to "backwards adj matrix" and then things should just work. :)

sbromberger commented 9 years ago

There are two issues here: how things are stored internally, and how they're presented to the user.

As far as storage, we should be using whatever's more performant. That is, if we're using CSC sparse matrices, we should represent forward adjacencies as fm[i,j] = 1 iff j->i, and backward adjacencies as bm[i,j] =1 iff i->j.

For presentation, however, this presents a problem. We don't want to expose the internal structures since this may result in cases where we have an adjacency matrix that is column-major for one implementation and row-major for the other. Since graph theorists (unscientific polling n = 3) typically expect adjacency matrices to be A[i,j] = 1 iff i->j, I think adjacency_matrix() should return something such that full(adjacency_matrix(g)) yields the expected matrix, and this should be completely independent of the underlying graph storage type.

I will rename references to :out and :in in adjacency_matrix() to :byrow or :bycol. These should be used by programming types, not graph theory types. :byrow will be default.

Any objections to this plan?

sbromberger commented 9 years ago

(I've implemented this in the sparsemx branch already. Closing out now - please feel free to reopen if you think it needs more discussion.)

jpfairbanks commented 9 years ago

This is a good recap of our discussion. I would just like to add that even full(adjacency_matrix(g)) is not independent of the underlying storage since access to columns of full(adjacency_matrix(g)) is faster than accessing the rows. So while correctness might be independent of storage, performance never is.

sbromberger commented 9 years ago

I've made some interesting progress with the new sparsematrix implementation. I've also implemented setindex!() and getindex() on graphs for a nice easy way to add and remove edges.

jpfairbanks commented 9 years ago

Nice, just make sure that the docs say that setindex! is inefficient compared to bulk insertions

sbromberger commented 9 years ago

We don't have a method yet for bulk insertions :)