Allow Different Sized Graphs

jtmatterer commented 9 years ago

Vince and I talked about this a couple of weeks ago, and he mentioned that when adding dummy vertices in the smaller graph, we can change the nonedges in the adjacency matrix to -1 to penalize nonedge misalignment with dummy vertices. Unfortunately, this makes the adjacency matrix not sparse, so we need to do this reassignment after clustering.

thegreatginski commented 9 years ago

If you view the transformed adjacency as 2A-J (where J is the matrix of all ones), then the additional -1's are not problems, as you have 2A minus a rank one matrix, all computations can still be done quickly. Padding with 0's then still keeps the rank one structure of the transform.

jtmatterer commented 9 years ago

I see that there's no issue with the math or algorithm. However, since we want to be able to handle graphs with a number of vertices n, such that an nxn array is too large to have in memory, until we run SGM on the clustered subgraphs, the adjacency matrices of both graphs needs to be stored in a sparse format. Unless I'm mistaken, the only way to have a sparse matrix in matlab is with lots of zeros.

So the issue is that I don't know how to write 2A-J as a sparse matrix in matlab.

thegreatginski commented 9 years ago

the trick would be to never make J. write the matrix as 2A and have the rank 1 J only be used in computation, as J= ones(n,1)*ones(1,n) the resulting multiplications would be easy and this can be built into each step of the code?

matlab can do sparse matrix times vector very quickly right?

jtmatterer commented 9 years ago

Aside from running SGM on subgraphs (when we can have dense adjacency matrices, I think), we only need the adjacency matrices in the embedding step, which is a SVD, which would be problematic.

I had a (potentially crazy) idea about how to handle this problem and the rearranging cluster hack. What if we embed the original adjacency matrices with unequal sizes and don't adjust the clusters so we allow unequal numbers of vertices from each graph in a cluster. Before matching, we add dummy nodes in the way you described.

When we were interested only in a 1-1 matching, this idea doesn't make sense, but if we want top-k, what do we lose? Also, it has the advantage of possibly being able to say when a vertex doesn't have a match in the other graph, which was mentioned as a desirable feature.

dpmcsuss commented 9 years ago

I think the SVD step can be done without shifting things and then just cluster. For the clusters we then cluster on unequal sizes and they should be small enough that the J term doesn't matter.

On Wed, Jun 3, 2015 at 1:34 PM, Jason Matterer notifications@github.com wrote:

Aside from running SGM on subgraphs (when we can have dense adjacency matrices, I think), we only need the adjacency matrices in the embedding step, which is a SVD, which would be problematic.

I had a (potentially crazy) idea about how to handle this problem and the rearranging cluster hack. What if we embed the original adjacency matrices with unequal sizes and don't adjust the clusters so we allow unequal numbers of vertices from each graph in a cluster. Before matching, we add dummy nodes in the way you described.

When we were interested only in a 1-1 matching, this idea doesn't make sense, but if we want top-k, what do we lose? Also, it has the advantage of possibly being able to say when a vertex doesn't have a match in the other graph, which was mentioned as a desirable feature.

— Reply to this email directly or view it on GitHub https://github.com/jhu-graphstat/LSGMcode/issues/3#issuecomment-108538694 .

jtmatterer commented 9 years ago

Added unequalGraphs branch and support for graphs with different sizes, see 3c68cd156903b44817961e540661f5e1c4b09715

jhu-graphstat / LSGMcode

Allow Different Sized Graphs #3