igraph / xdata-igraph

xdata igraph, has been merged into igraph/igraph
GNU General Public License v2.0
18 stars 3 forks source link

SGM #1

Closed gaborcsardi closed 10 years ago

gaborcsardi commented 10 years ago

Hi Gabor et al. Here is the R-code to run SGM (or at least a uber-inefficient version (though I think correct) I threw together myself). How do you propose that we discuss how to move forward with this?

  1. Code to match graphs A and B using m seeds--assumes seeds are vertices 1:m in both A and B. Also assumes that A and B are correctly aligned to begin with--outputs number of correct matches. The matching could easily be outputted as well.

    sgm<-function(A,B,m){
     totv<-ncol(A)
     n<-totv-m
     A12<-A[1:m,(m+1):(m+n)]
     A21<-as.matrix(A[(m+1):(m+n),1:m])
     A22<-A[(m+1):(m+n),(m+1):(m+n)]
     B12<-B[1:m,(m+1):(m+n)]
     B21<-as.matrix(B[(m+1):(m+n),1:m])
     B22<-B[(m+1):(m+n),(m+1):(m+n)]
    
     patience<-20
     tol<-.99
     P<-matrix(1,n,n)/n
     toggle<-1
     iter<-0
     while (toggle==1 & iter<patience)
       {
         iter<-iter+1
         Grad<-2*A22%*%P%*%t(B22)+2*A21%*%t(B21);
         ind<-matrix(solve_LSAP(Grad, maximum =TRUE))
         T<-diag(n)
         T<-T[ind,]
         c<-sum(diag(t(A22)%*%P%*%B22%*%t(P)))
         d<-sum(diag(t(A22)%*%T%*%B22%*%t(P)))+sum(diag(t(A22)%*%P%*%B22%*%t(T)))
         e<-sum(diag(t(A22)%*%T%*%B22%*%t(T)))
         u<-2*sum(diag(t(P)%*%A21%*%t(B21)))
         v<-2*sum(diag(t(T)%*%A21%*%t(B21)))
         if( c-d+e==0 && d-2*e+u-v==0){
           alpha<-0
         }else{
           alpha<- -(d-2*e+u-v)/(2*(c-d+e))}
         f0<-0
         f1<- c-e+u-v
         falpha<-(c-d+e)*alpha^2+(d-2*e+u-v)*alpha
         if(alpha < tol && alpha > 0 && falpha > f0 && falpha > f1){
           P<- alpha*P+(1-alpha)*T
         }else if(f0 > f1){
           P<-T
         }else{
           toggle<-0}
       }
     corr<-matrix(solve_LSAP(P, maximum = TRUE))
     corr<-cbind(matrix((m+1):totv, n),matrix(m+corr,n))
    }
  2. Creates rho-correlated ER(n,p) random graphs with true alignment given by a permutation (labelled permutation in the input).

    CER<-function( n, p, rho ,permutation){
     q<-p+rho*(1-p)
     A<-get.adjacency(erdos.renyi.game(n,p))
     B<-A;
     for( i in 1:(n-1)){
       for(j in (i+1):n){
         if(A[i,j]==1 && runif(1)>q){
           B[i,j]<-0
           B[j,i]<-0
         }else if(A[i,j]==0 && runif(1)< ((1-q)*( p/(1-p) ))){
           B[i,j]<-1
           B[j,i]<-1}    
       }
     }
     P<-diag(n)
     P<-P[permutation,]
     B<-P%*%B%*%t(P)
    }
  3. Runs iter number of iterations of SGM on two rho correlated ER(n,p) graphs with m randomly chosen seeds

    SGMcorr<-function(n,p,rho,m, iter)
     {
       corr<-matrix(0,iter)
       for (i in 1:iter){
         perm0<-sample(n-m)
         perm1<-matrix(perm0+m,n-m)
         permutation<-rbind(matrix(1:m,m),perm1)
         W<-CER(n,p,rho,permutation)
         mat<-sgm(W[,1:n],W[,(n+1):(2*n)],m)
         mat<-mat[,2]
         P<-diag(n-m)
         P<-P[perm0,]
         mat<-mat%*%t(P)
         matched<-sum(((m+1):n)==mat)
         corr[i]<-matched/(n-m)
    
       }
       corr
     }

Cheers, Vince

gaborcsardi commented 10 years ago

The relevant papers seem to be:

Seeded graph matching for correlated Erdos-Renyi graphs, by Vince Lyzinski, Donniell E. Fishkind, Carey E. Priebe http://arxiv.org/abs/1304.7844

Seeded graph matching for large stochastic block model graphs, by Vince Lyzinski, Daniel L. Sussman, Donniell E. Fishkind, Henry Pao, Carey E. Priebe http://arxiv.org/abs/1310.1297

gaborcsardi commented 10 years ago

An easy way to speed up the generation function is to do geometric sampling for the edges that are realized in A and that are not, separately. This is super easy.

Btw. one thing I don't understand is that the generation function only returns graph B, without returning A, and A is not an input argument either. So all in all it is just a fancy way of creating an ER graph. How do you actually generate a pair of graphs?

gaborcsardi commented 10 years ago

From Vince:

Hi Gabor, Sorry for the tardy reply. Yes, those papers are the relevant ones for these algorithms. For the generation function, I agree, it should return both A and B (I think I had code that did this--and it seems to be lost in the ether :-) ). Ideally, A would be an input and the generation function would generate a rho-correlated B. Something like:

CER<-function( A, p, rho ,permutation){
    q<-p+rho*(1-p)    
    B<-A;
    for( i in 1:(n-1)){
        for(j in (i+1):n){
            if(A[i,j]==1 && runif(1)>q){
                B[i,j]<-0
                B[j,i]<-0
            }else if(A[i,j]==0 && runif(1)< ((1-q)*( p/(1-p) ))){
                B[i,j]<-1
                B[j,i]<-1}    
        }
    }
    P<-diag(n)
    P<-P[permutation,]
    B<-P%*%B%*%t(P)
}

would do it. I think (later on), I'd like to generalize this to correlated random Bernoulli graphs (but that is a song for another day).

Would you like the R code for an example run? Let me know, and I'll get it to you as soon as possible.

jovo commented 10 years ago

this one: http://arxiv.org/abs/1112.5507 describes the non-seeded algorithm, which is just a special case of the seeded one, and might be helpful.

gaborcsardi commented 10 years ago

@vince: I might be missing something, but isn't an ER G(n,p) exactly the same as a random Bernoulli graph?

gaborcsardi commented 10 years ago

SGM code for igraph

This is code for implementing our Seeded Graph Matching algorithm. Our algorithm takes two adjacency matrices, ( A ) and ( B ), and a seeding function ( m ) (we assume that the first ( m ) vertices of each graph are the seeded vertices) and outputs a matching of the unseeded vertices.

For example, suppose that ( A ) is an Erdos-Renyi graph with 100 vertices and ( p=0.4 ).

require(igraph)
## Loading required package: igraph
require(clue)
## Loading required package: clue
A <- as.matrix(get.adjacency(erdos.renyi.game(100, 0.4)))
## Loading required package: Matrix
## Loading required package: lattice

In this example, we want to match ( A ) with a graph that has edge correlation ( \rho=0.7 ). We call the adjacency matrix of this second graph ( B ). We now create ( B ) and in doing so permute the labels of ( B ) with a permutation perm

I wrote the auxiliary function adjcorr to do this

adjcorr <- function(A, P, corr, permutation) {
    # input is A modelled from a random binomial graph with A_{i,j} distributed
    # Bin(P_{i,j}) for example, if P=.5*matrix(1,n,n) then A is ER(n,0.5) output
    # is B which is adjacency matrix with correlation corr element-wise to A the
    # labels of B are then permuted via permutation
    Q <- P + corr * (1 - P)
    n <- nrow(A)
    B <- A
    for (i in 1:(n - 1)) {
        for (j in (i + 1):n) {
            if (A[i, j] == 1 && runif(1) > Q[i, j]) {
                B[i, j] <- 0
                B[j, i] <- 0
            } else if (A[i, j] == 0 && runif(1) < ((1 - Q[i, j]) * (P[i, j]/(1 - 
                P[i, j])))) {
                B[i, j] <- 1
                B[j, i] <- 1
            }
        }
    }
    P <- diag(n)
    P <- P[permutation, ]
    B <- P %*% B %*% t(P)
    return(B)
}

Now to make ( B ), keeping the first ( m=20 ) labels unpermuted

perm <- matrix(sample(80))
x <- matrix(1:20)
perm <- rbind(x, perm)
P <- 0.4 * matrix(1, 100, 100)
# P is the matrix with edge probabilities for A
B <- adjcorr(A, P, 0.7, perm)
dim(B)
## [1] 100 100

Now to match A and B. First the code:

sgm <- function(A, B, m, start, iteration) {
    # seeds are assumed to be vertices 1:m in both graphs
    totv <- ncol(A)
    n <- totv - m
    A12 <- A[1:m, (m + 1):(m + n)]
    A21 <- as.matrix(A[(m + 1):(m + n), 1:m])
    A22 <- A[(m + 1):(m + n), (m + 1):(m + n)]
    B12 <- B[1:m, (m + 1):(m + n)]
    B21 <- as.matrix(B[(m + 1):(m + n), 1:m])
    B22 <- B[(m + 1):(m + n), (m + 1):(m + n)]

    patience <- iteration
    tol <- 0.99
    P <- start
    toggle <- 1
    iter <- 0
    while (toggle == 1 & iter < patience) {
        iter <- iter + 1
        Grad <- 2 * A22 %*% P %*% t(B22) + 2 * A21 %*% t(B21)
        ind <- matrix(solve_LSAP(Grad, maximum = TRUE))
        T <- diag(n)
        T <- T[ind, ]
        c <- sum(diag(t(A22) %*% P %*% B22 %*% t(P)))
        d <- sum(diag(t(A22) %*% T %*% B22 %*% t(P))) + sum(diag(t(A22) %*% 
            P %*% B22 %*% t(T)))
        e <- sum(diag(t(A22) %*% T %*% B22 %*% t(T)))
        u <- 2 * sum(diag(t(P) %*% A21 %*% t(B21)))
        v <- 2 * sum(diag(t(T) %*% A21 %*% t(B21)))
        if (c - d + e == 0 && d - 2 * e + u - v == 0) {
            alpha <- 0
        } else {
            alpha <- -(d - 2 * e + u - v)/(2 * (c - d + e))
        }
        f0 <- 0
        f1 <- c - e + u - v
        falpha <- (c - d + e) * alpha^2 + (d - 2 * e + u - v) * alpha
        if (alpha < tol && alpha > 0 && falpha > f0 && falpha > f1) {
            P <- alpha * P + (1 - alpha) * T
        } else if (f0 > f1) {
            P <- T
        } else {
            toggle <- 0
        }
    }
    corr <- matrix(solve_LSAP(P, maximum = TRUE))
    corr <- cbind(matrix((m + 1):totv, n), matrix(m + corr, n))
    return(corr)
}

Now the matching (we begin at the barycenter in ( \mathbb{R}^{80\times 80} ))

start <- (1/80) * matrix(1, 80, 80)
match <- sgm(A, B, 20, start, 25)
# matching A and B with 20 seeds started in 'start' with 25 iteration
# allowed in our F-W routine
match
##       [,1] [,2]
##  [1,]   21   29
##  [2,]   22   53
##  [3,]   23   23
##  [4,]   24   82
##  [5,]   25   52
##  [6,]   26   45
##  [7,]   27   95
##  [8,]   28   30
##  [9,]   29   39
## [10,]   30   89
## [11,]   31   75
## [12,]   32   24
## [13,]   33   73
## [14,]   34   43
## [15,]   35   86
## [16,]   36  100
## [17,]   37   40
## [18,]   38   88
## [19,]   39   93
## [20,]   40   37
## [21,]   41   22
## [22,]   42   54
## [23,]   43   79
## [24,]   44   92
## [25,]   45   25
## [26,]   46   57
## [27,]   47   44
## [28,]   48   97
## [29,]   49   70
## [30,]   50   31
## [31,]   51   42
## [32,]   52   48
## [33,]   53   69
## [34,]   54   72
## [35,]   55   74
## [36,]   56   90
## [37,]   57   62
## [38,]   58   71
## [39,]   59   80
## [40,]   60   94
## [41,]   61   99
## [42,]   62   27
## [43,]   63   76
## [44,]   64   67
## [45,]   65   83
## [46,]   66   65
## [47,]   67   96
## [48,]   68   58
## [49,]   69   87
## [50,]   70   32
## [51,]   71   50
## [52,]   72   41
## [53,]   73   85
## [54,]   74   59
## [55,]   75   77
## [56,]   76   84
## [57,]   77   63
## [58,]   78   78
## [59,]   79   26
## [60,]   80   91
## [61,]   81   81
## [62,]   82   66
## [63,]   83   21
## [64,]   84   64
## [65,]   85   68
## [66,]   86   36
## [67,]   87   55
## [68,]   88   33
## [69,]   89   51
## [70,]   90   46
## [71,]   91   34
## [72,]   92   61
## [73,]   93   35
## [74,]   94   49
## [75,]   95   56
## [76,]   96   98
## [77,]   97   47
## [78,]   98   28
## [79,]   99   60
## [80,]  100   38

match is interpreted as follows: look at row 1 of match

match[1, ]
## [1] 21 29
# this says that vertex
match[1, 1]
## [1] 21
# in B is matched to vertex
match[1, 2]
## [1] 29
# in A