Speed up SGM - Githubissues

gaborcsardi commented 10 years ago

On Tue, Jan 14, 2014 at 1:57 PM, G=E1bor Cs=E1rdi = csardi.gabor@gmail.com wrote: Hi all,

On Tue, Jan 14, 2014 at 1:33 PM, Youngser Park youngser@jhu.edu = wrote: Gabor,

The current sgm implementation in igraph is Vince's R code that calls = clue::solve_LSAP. This seems to be a computational bottleneck which we want to improve, = especially for large graphs.

As you can see from below, Vince seems to be hopeful if we can call = the matlab function in R. Is this something you can help us?

I don't think you can call a Matlab function from R, at least not = easily. Maybe with the R.matlab package it is possible, so you might = want to look at that. But this is not something we can put in igraph, we = cannot require a user to run Matlab.

As for the C++/Pascal/Fortran code of the original authors at = http://www.magiclogic.com/assignment.html, we cannot use that, either, = because it is only free for non-commercial use, which is not compatible = with igraph's license. The python code is just calling this C++ code.

Also, is there a room in the current sgm function that you can speedup = without too much trouble??

I am not sure what you mean. You are saying that the bottleneck is = clue::solve_LSAP, so that needs a speedup, if that is possible. Btw. how = much of the total running time is spent is this function? I.e. how big = is this bottleneck?

I'll read the original paper and see if it is easy to reimplement = their method.

G.

Thanks!

Youngser

Begin forwarded message:

From: Carey E Priebe cep@jhu.edu Subject: Re: graph matching Date: January 14, 2014 at 1:12:13 PM EST To: Vince Lyzinski vincelyzinski@gmail.com Cc: Youngser Park youngser@jhu.edu

what is the igraph story for this, YP? ---cep

On Jan 13, 2014, at 8:49 PM, Vince Lyzinski vincelyzinski@gmail.com = wrote:

Hi Youngser + Carey,

The LAPJV code is a fast way to solve the linear assignment problem: = minimize tr(P^T C) over permutation matrices P. C is called the cost = matrix. Jonker and Volgenant have the fastest code for solving this = problem.

I cannot seem to find the original C++ code written by Jonker, but = found the Pascal code here (LAPJV)

http://www.assignmentproblems.com/LAPJV.htm

and the code in python here:

https://github.com/hrldcpr/pyLAPJV

The above two codes only work for integer cost matrices, and would = need to have some variable types changed to handle real cost matrices. = (seems ugly?)

The code is implemented in Matlab here and works for any cost = matrix:

http://www.mathworks.com/matlabcentral/fileexchange/26836-lapjv-jonker-vol= genant-algorithm-for-linear-assignment-problem-v3-0

The LAPJV code would be faster than the solve_LSAP that I = implemented in my sgm code. Would it be possible to have this new = linear assignment problem code implemented as a R function I could call = in my code? I could then modify my code accordingly.

Cheers Vince

On Mon, Jan 13, 2014 at 3:45 PM, Youngser Park youngser@jhu.edu = wrote: On Jan 13, 2014, at 3:40 PM, Carey E Priebe cep@jhu.edu wrote:

Youngser, Vince says graph matching will be much faster if we use a C = routine he has for the LAP. can Gabor do that?

In fact, Gabor just put Vince's R code in the package without = touching it at all!

Sure! Let's do that!

Vince, Just gimme the stuff with some explanation, then I'll open a new = case!

Youngser

thanks, carey

jovo commented 10 years ago

here is a link to the best matlab function right now to my knowledge: http://www.mathworks.com/matlabcentral/fileexchange/26836-lapjv-jonker-volgenant-algorithm-for-linear-assignment-problem-v3-0

note that this is an exact solution to LAP. i know that there is a literature on approximate LAP, but i don't know the literature even a little bit.

gaborcsardi commented 10 years ago

This seems to be the same as the one above, which we cannot use.

jovo commented 10 years ago

oh. well, here is another: http://www.mathworks.com/matlabcentral/fileexchange/30838-faster-jonker-volgenant-assignment-algorithm it has a BSD license. not sure if you can use that, but the algorithm is clear enough....

gaborcsardi commented 10 years ago

BSD is fine, but how can this be BSD, if the majority of the code is taken from the other one?

jovo commented 10 years ago

tracing back, each code seems to be filed under BSD: http://www.mathworks.com/matlabcentral/fileexchange/26836-lapjv-jonker-volgenant-algorithm-for-linear-assignment-problem-v3-0

http://www.mathworks.com/matlabcentral/fileexchange/20652-hungarian-algorithm-for-linear-assignment-problems-v2-3

http://www.mathworks.com/matlabcentral/fileexchange/94-assignprob-zip

maybe just the original C code is not?

anyway, would you have to reimplement it in R/C, since these are all real matlab functions?

gaborcsardi commented 10 years ago

OK, I just measured with a 1000x1000 matrix, solve_LSAP is about 5% of the total running time, so making it 10 times faster will speed up the whole algorithm by at most 5%.

In fact 93% of the running time is spent multiplying matrices, see below.

I used Vince's code, but made everything ten times bigger. If this is not the standard use case, then please send me one. Otherwise it is really not worth reimplementing solve_LSAP. Thanks.

Gabor

> summaryRprof()
$by.self
              self.time self.pct total.time total.pct
"%*%"            257.46    93.03     257.46     93.03
".C"              14.94     5.40      14.94      5.40
"t.default"        2.62     0.95       2.62      0.95
"c"                0.68     0.25       0.68      0.25
"sgm"              0.52     0.19     276.74    100.00
"-"                0.22     0.08       0.22      0.08
"*"                0.08     0.03       0.08      0.03
"+"                0.08     0.03       0.08      0.03
"<"                0.06     0.02       0.06      0.02
"diag"             0.02     0.01     224.66     81.18
"<Anonymous>"      0.02     0.01      15.28      5.52
"any"              0.02     0.01       0.02      0.01
"max"              0.02     0.01       0.02      0.01

$by.total
              total.time total.pct self.time self.pct
"sgm"             276.74    100.00      0.52     0.19
"%*%"             257.46     93.03    257.46    93.03
"diag"            224.66     81.18      0.02     0.01
"<Anonymous>"      15.28      5.52      0.02     0.01
"matrix"           15.28      5.52      0.00     0.00
".C"               14.94      5.40     14.94     5.40
"t.default"         2.62      0.95      2.62     0.95
"t"                 2.62      0.95      0.00     0.00
"c"                 0.68      0.25      0.68     0.25
"-"                 0.22      0.08      0.22     0.08
"*"                 0.08      0.03      0.08     0.03
"+"                 0.08      0.03      0.08     0.03
"<"                 0.06      0.02      0.06     0.02
"any"               0.02      0.01      0.02     0.01
"max"               0.02      0.01      0.02     0.01

$sample.interval
[1] 0.02

$sampling.time
[1] 276.74

jovo commented 10 years ago

that's so great! and i presume that making the matrix multiplication faster is non-trivial?

can you please explain to me why, under $by.total, matrix multiple takes 93%, but diag takes 81%? is diag part of matrix multiply?

anyway, assuming making matrix multiply faster is hard, i guess we can close this ticket for now, until we come up with a new algorithm that doesn't require such a large dense matrix multiply...

On Tue, Jan 14, 2014 at 10:44 PM, Gabor Csardi notifications@github.comwrote:

OK, I just measured with a 1000x1000 matrix, solve_LSAP is about 5% of the total running time, so making it 10 times faster will speed up the whole algorithm by at most 5%.

In fact 93% of the running time is spent multiplying matrices, see below.

I used Vince's code, but made everything ten times bigger. If this is not the standard use case, then please send me one. Otherwise it is really not worth reimplementing solve_LSAP. Thanks.

Gabor

summaryRprof() $by.self self.time self.pct total.time total.pct "%%" 257.46 93.03 257.46 93.03 ".C" 14.94 5.40 14.94 5.40 "t.default" 2.62 0.95 2.62 0.95 "c" 0.68 0.25 0.68 0.25 "sgm" 0.52 0.19 276.74 100.00 "-" 0.22 0.08 0.22 0.08 "" 0.08 0.03 0.08 0.03 "+" 0.08 0.03 0.08 0.03 "<" 0.06 0.02 0.06 0.02 "diag" 0.02 0.01 224.66 81.18 "" 0.02 0.01 15.28 5.52 "any" 0.02 0.01 0.02 0.01 "max" 0.02 0.01 0.02 0.01

$by.total total.time total.pct self.time self.pct "sgm" 276.74 100.00 0.52 0.19 "%%" 257.46 93.03 257.46 93.03 "diag" 224.66 81.18 0.02 0.01 "" 15.28 5.52 0.02 0.01 "matrix" 15.28 5.52 0.00 0.00 ".C" 14.94 5.40 14.94 5.40 "t.default" 2.62 0.95 2.62 0.95 "t" 2.62 0.95 0.00 0.00 "c" 0.68 0.25 0.68 0.25 "-" 0.22 0.08 0.22 0.08 "" 0.08 0.03 0.08 0.03 "+" 0.08 0.03 0.08 0.03 "<" 0.06 0.02 0.06 0.02 "any" 0.02 0.01 0.02 0.01 "max" 0.02 0.01 0.02 0.01

$sample.interval [1] 0.02

$sampling.time [1] 276.74

— Reply to this email directly or view it on GitHubhttps://github.com/igraph/xdata-igraph/issues/22#issuecomment-32332466 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

gaborcsardi commented 10 years ago

Because some of the %*% is within diag(), but not all of it.

You cannot really realistically make matrix multiplication faster, except for using a faster BLAS. But even with a faster BLAS, it seems unrealistic that solve_LSAP would be the bottleneck.

What is possible, though, is to simplify the matrix multiplications, because it seems that often only the sum of the diagonal of the result is used, so why calculate the full product matrices at all?

dpmcsuss commented 10 years ago

Yes, lines 47-56 could be easily sped up: https://github.com/igraph/xdata-igraph/blob/develop/interfaces/R/igraph/R/sgm.R#L47-L56

    Grad <- 2 * A22 %*% P %*% t(B22) + 2 * A21 %*% t(B21)
    ind <- matrix(clue::solve_LSAP(Grad, maximum = TRUE))
    T <- diag(n)
    T <- T[ind, ]
    c <- sum(diag(t(A22) %*% P %*% B22 %*% t(P)))
    d <- sum(diag(t(A22) %*% T %*% B22 %*% t(P))) + sum(diag(t(A22) %*% P %*% B22 %*% t(T)))
    e <- sum(diag(t(A22) %*% T %*% B22 %*% t(T)))
    u <- 2 * sum(diag(t(P) %*% A21 %*% t(B21)))
    v <- 2 * sum(diag(t(T) %*% A21 %*% t(B21)))

First, as Gabor suggested the diag means that we should be able to reduce complexity by not doing all the mults.

Second, I think permuting matrices by (non-sparse) matrix multiplication is very slow relative to doing something like A22[ind,] or A22[ind].

Also, there are a bunch of repeated calculations, t(A22) %*% P %*% B22 and t(A22) %*% T %*% B22 for example.

That at least seems like the obvious low hanging fruit.

gaborcsardi commented 10 years ago

I am not even sure why we are using (dense) matrices here, actually.

Btw. I am not sure what to do with the adjcorr function, either. https://github.com/igraph/xdata-igraph/blob/develop/interfaces/R/igraph/R/sgm.R#L2

We already have (a better) version for E-R graphs. adjcorr is more general, because it handles an arbitary P matrix, i.e. the parameter for Bernoulli can be different for different edges. But is this really needed? Isn't it enough to have an E-R, i.e. constant parameter, or at most an SBM?

youngser commented 10 years ago

I don't think adjcorr is the part of the code. It must be a left over from Vince's other code.

I modified some lines (including Daniel's):

        Grad <- 2 * A22 %*% P %*% t(B22) + 2 * A21 %*% t(B21)
        ind <- matrix(clue::solve_LSAP(Grad, maximum = TRUE))
        T <- diag(n)
        T <- T[ind, ]
        tA22PB22 <- t(A22) %*% P %*% B22
        tA22TB22 <- t(A22) %*% T %*% B22
        c <- sum(diag(tA22PB22 %*% t(P)))
        d <- sum(diag(tA22TB22 %*% t(P))) + sum(diag(tA22PB22 %*% t(T)))
        e <- sum(diag(tA22TB22 %*% t(T)))
        u <- 2 * sum(diag(t(P) %*% A21 %*% t(B21)))
        v <- 2 * sum(diag(t(T) %*% A21 %*% t(B21)))

and got about ~15% of speed up. As you can see, these changes are just easy ones, but I couldn't go any further. NB: For a fair comparison, I didn't call igraph::sgm, instead used a local version of sgm.

Here is what Rprofile says:

>     summaryRprof(tmp)
$by.self
            self.time self.pct total.time total.pct
"%*%"          151.70    62.13     151.70     62.13
".C"            88.94    36.43      88.94     36.43
"t.default"      1.62     0.66       1.62      0.66
"c"              0.92     0.38       0.92      0.38
"*"              0.26     0.11       0.26      0.11
"-"              0.26     0.11       0.26      0.11
"sgm2"           0.20     0.08     244.06     99.96
"gc"             0.10     0.04       0.10      0.04
"diag"           0.06     0.02      63.52     26.02
"+"              0.06     0.02       0.06      0.02
"assign"         0.02     0.01       0.02      0.01
"max"            0.02     0.01       0.02      0.01

$by.total
                  total.time total.pct self.time self.pct
"system.time"         244.16    100.00      0.00     0.00
"sgm2"                244.06     99.96      0.20     0.08
"%*%"                 151.70     62.13    151.70    62.13
"matrix"               89.22     36.54      0.00     0.00
"<Anonymous>"          89.18     36.53      0.00     0.00
".C"                   88.94     36.43     88.94    36.43
"standardGeneric"      63.96     26.20      0.00     0.00
"diag"                 63.52     26.02      0.06     0.02
"t"                     1.64      0.67      0.00     0.00
"t.default"             1.62      0.66      1.62     0.66
"c"                     0.92      0.38      0.92     0.38
"*"                     0.26      0.11      0.26     0.11
"-"                     0.26      0.11      0.26     0.11
"gc"                    0.10      0.04      0.10     0.04
"+"                     0.06      0.02      0.06     0.02
"solve_LSAP"            0.04      0.02      0.00     0.00
"assign"                0.02      0.01      0.02     0.01
"max"                   0.02      0.01      0.02     0.01
"loadMethod"            0.02      0.01      0.00     0.00

$sample.interval
[1] 0.02

$sampling.time
[1] 244.16

Now, it's a bit different?

jovo commented 10 years ago

seems like still doing this twice:

sum(diag(tA22PB22 %*% t(T)))

?

On Thu, Jan 16, 2014 at 2:44 PM, youngser notifications@github.com wrote:

I don't think adjcorr is the part of the code. It must be a left over from Vince's other code.

I modified some lines (including Daniel's):
    Grad <- 2 * A22 %*% P %*% t(B22) + 2 * A21 %*% t(B21)
    ind <- matrix(clue::solve_LSAP(Grad, maximum = TRUE))
    T <- diag(n)
    T <- T[ind, ]
    tA22PB22 <- t(A22) %*% P %*% B22
    tA22TB22 <- t(A22) %*% T %*% B22
    c <- sum(diag(tA22PB22 %*% t(P)))
    d <- sum(diag(tA22TB22 %*% t(P))) + sum(diag(tA22PB22 %*% t(T)))
    e <- sum(diag(tA22TB22 %*% t(T)))
    u <- 2 * sum(diag(t(P) %*% A21 %*% t(B21)))
    v <- 2 * sum(diag(t(T) %*% A21 %*% t(B21)))
and got about ~15% of speed up. As you can see, these changes are just easy ones, but I couldn't go any further. NB: For a fair comparison, I didn't call igraph::sgm, instead used a local version of sgm.

Here is what Rprofile says:
summaryRprof(tmp)
$by.self self.time self.pct total.time total.pct "%%" 151.70 62.13 151.70 62.13 ".C" 88.94 36.43 88.94 36.43 "t.default" 1.62 0.66 1.62 0.66 "c" 0.92 0.38 0.92 0.38 "" 0.26 0.11 0.26 0.11 "-" 0.26 0.11 0.26 0.11 "sgm2" 0.20 0.08 244.06 99.96 "gc" 0.10 0.04 0.10 0.04 "diag" 0.06 0.02 63.52 26.02 "+" 0.06 0.02 0.06 0.02 "assign" 0.02 0.01 0.02 0.01 "max" 0.02 0.01 0.02 0.01
$by.total total.time total.pct self.time self.pct "system.time" 244.16 100.00 0.00 0.00 "sgm2" 244.06 99.96 0.20 0.08 "%%" 151.70 62.13 151.70 62.13 "matrix" 89.22 36.54 0.00 0.00 "" 89.18 36.53 0.00 0.00 ".C" 88.94 36.43 88.94 36.43 "standardGeneric" 63.96 26.20 0.00 0.00 "diag" 63.52 26.02 0.06 0.02 "t" 1.64 0.67 0.00 0.00 "t.default" 1.62 0.66 1.62 0.66 "c" 0.92 0.38 0.92 0.38 "" 0.26 0.11 0.26 0.11 "-" 0.26 0.11 0.26 0.11 "gc" 0.10 0.04 0.10 0.04 "+" 0.06 0.02 0.06 0.02 "solve_LSAP" 0.04 0.02 0.00 0.00 "assign" 0.02 0.01 0.02 0.01 "max" 0.02 0.01 0.02 0.01 "loadMethod" 0.02 0.01 0.00 0.00

$sample.interval [1] 0.02

$sampling.time [1] 244.16

Now, it's a bit different?

— Reply to this email directly or view it on GitHubhttps://github.com/igraph/xdata-igraph/issues/22#issuecomment-32521916 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

youngser commented 10 years ago

I can't find that more than once with "copy and search"...

jovo commented 10 years ago

sorry, you are correct. it does occur, to me, however, that all this stuff is just to get the step size. because the function is quadratic, it is enticing to find the exact solution. on the other hand, backtracking line search, for example, might work perfectly fine, and would be much faster i'd expect.

using (i) re-indexing (rather than matrix multiply), and (ii) sparse matrix multiply where appropriate, might also make a big difference.

On Thu, Jan 16, 2014 at 5:36 PM, youngser notifications@github.com wrote:

I can't find that more than once with "copy and search"...

— Reply to this email directly or view it on GitHubhttps://github.com/igraph/xdata-igraph/issues/22#issuecomment-32554757 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

youngser commented 10 years ago

We (mainly Minh Tang) achieved a bit more speed up on this. We're making one more try before submitting the updated version...

gaborcsardi commented 10 years ago

OK, now that I managed to speed up everything by a lot, it is indeed solve_LSAP() the bottleneck. solve_LSAP() is actually not R code, but C, and FWIW this matching was already implemented in igraph, it is named maximu.bipartite.matching(). Of course this works on igraph graphs and not on matrices.

gaborcsardi commented 9 years ago

Good as it is now, speedwise.

igraph / xdata-igraph

Speed up SGM #22