sbromberger / SimpleWeightedGraphs.jl

Simple weighted graphs. Requires LightGraphs.jl.
Other
49 stars 22 forks source link

Why G.weights is the transpose of adjacent_matrix(G) #65

Open ChasingZenith opened 3 years ago

ChasingZenith commented 3 years ago

I just wondering why do we design this. when I add an edge

add_edge!(G, 1, 3, 0.9)

G.weights is

5×5 SparseArrays.SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [3, 1]  =  0.9

Sorry if it is not proper to ask question by raising issues.

sbromberger commented 3 years ago

Hi @ChasingZenith! Thanks for your post. It's perfectly acceptable to ask questions in the issues list.

The reason we use the transpose is because the backing store for SimpleWeightedGraphs is a CSC sparse matrix, and that means that it's column-major. Consider a SimpleWeightedDiGraph. For most graph operations one wants the outneighbors more than one needs the inneighbors, so providing the best possible performance for outneighbors is a priority.

Because the sparse matrix is column-major, this means that the fastest access is when you iterate over columns of the matrix. Therefore, having the outneighbors of a given vertex u be the uth column provides better performance than if they were in the uth row. This results in the backing store being the transpose of the adjacency matrix.

Does this answer your question?

ChasingZenith commented 3 years ago

Your answer is very illustrating.

But please give me some time to digest what you have to say.

What I am not familiar is CSC sparse matrices. I have read the sparse array page on Julia's documentation. It seems that we should at least obey some rule to make use of CSC format, otherwise the code will become slow. And in fact after reading that still I don't know how to use CSC sparse matrices.

I think that the functions and algorithms provided in this packages can make full use of how G.weights is stored.

On user side, should users always take care of the back storing scheme used in this package and optimize the code? What this package suppose users to do?

for example:

  1. Should we use G.weights or adjacency_matrix(G)? ( I have read the source code of adjacency_matrix() and I know that function introduce additional cost for copying variables. )
  2. One of the questions in my application is: Does column-major form make right multiply a sparse matrix A with a vector v slower than left multiply?
    A * v #slower ?
    v'*(B) # faster ? B is computed in advanced

Many question is asked in this comment. If there is any website that can give clues, please tell me.

sbromberger commented 3 years ago

Sparse Matrices are laid out in memory as three vectors: a vector of row indices, a vector of column pointers, and a vector of nonzero values. What this means is that the column pointers point to offsets in the row indices where new columns start. What this further means is that most adjacent values in the vector of row indices are values that are in the same column. From a cache optimization perspective, iterating over columns is much faster as it's two adjacent lookups from the column pointer vector, along with a series of adjacent lookups from the row index vector.

Should we use G.weights or adjacency_matrix(G)? ( I have read the source code of adjacency_matrix() and I know that function introduce additional cost for copying variables. )

You should never access a field of a graph directly. The portable way of accessing the weights is weights(G). Use that if you want the weights, and use adjacency_matrix if you need adjacency information: even if there is no functional difference for this graph type, there may be for others.

One of the questions in my application is: Does column-major form make right multiply a sparse matrix A with a vector v slower than left multiply?

I don't know. That's a question best asked of the SparseArrays folks.

ChasingZenith commented 3 years ago

Thank you.

ChasingZenith commented 3 years ago

You should never access a field of a graph directly. The portable way of accessing the weights is weights(G). Use that if you want the weights, and use adjacency_matrix if you need adjacency information: even if there is no functional difference for this graph type, there may be for others.

I closed this issues several days ago. Today, I notice that even if you say we should obey the philosophy of never accessing a field of graph directly, but the example in README only mention G.weights but not weights(G) which is what cause my confusion here. Is it better to modify the example in README to make it clear about the idea mentioned in this issue?

sbromberger commented 3 years ago

If you'd like to make a PR with your proposed changes, I'd be thrilled to review it.

ChasingZenith commented 3 years ago

I’ll have a try.