Closed gaborcsardi closed 10 years ago
The relevant papers seem to be:
Seeded graph matching for correlated Erdos-Renyi graphs, by Vince Lyzinski, Donniell E. Fishkind, Carey E. Priebe http://arxiv.org/abs/1304.7844
Seeded graph matching for large stochastic block model graphs, by Vince Lyzinski, Daniel L. Sussman, Donniell E. Fishkind, Henry Pao, Carey E. Priebe http://arxiv.org/abs/1310.1297
An easy way to speed up the generation function is to do geometric sampling for the edges that are realized in A and that are not, separately. This is super easy.
Btw. one thing I don't understand is that the generation function only returns graph B, without returning A, and A is not an input argument either. So all in all it is just a fancy way of creating an ER graph. How do you actually generate a pair of graphs?
From Vince:
Hi Gabor, Sorry for the tardy reply. Yes, those papers are the relevant ones for these algorithms. For the generation function, I agree, it should return both A and B (I think I had code that did this--and it seems to be lost in the ether :-) ). Ideally, A would be an input and the generation function would generate a rho-correlated B. Something like:
CER<-function( A, p, rho ,permutation){
q<-p+rho*(1-p)
B<-A;
for( i in 1:(n-1)){
for(j in (i+1):n){
if(A[i,j]==1 && runif(1)>q){
B[i,j]<-0
B[j,i]<-0
}else if(A[i,j]==0 && runif(1)< ((1-q)*( p/(1-p) ))){
B[i,j]<-1
B[j,i]<-1}
}
}
P<-diag(n)
P<-P[permutation,]
B<-P%*%B%*%t(P)
}
would do it. I think (later on), I'd like to generalize this to correlated random Bernoulli graphs (but that is a song for another day).
Would you like the R code for an example run? Let me know, and I'll get it to you as soon as possible.
this one: http://arxiv.org/abs/1112.5507 describes the non-seeded algorithm, which is just a special case of the seeded one, and might be helpful.
@vince: I might be missing something, but isn't an ER G(n,p) exactly the same as a random Bernoulli graph?
This is code for implementing our Seeded Graph Matching algorithm. Our algorithm takes two adjacency matrices, ( A ) and ( B ), and a seeding function ( m ) (we assume that the first ( m ) vertices of each graph are the seeded vertices) and outputs a matching of the unseeded vertices.
For example, suppose that ( A ) is an Erdos-Renyi graph with 100 vertices and ( p=0.4 ).
require(igraph)
## Loading required package: igraph
require(clue)
## Loading required package: clue
A <- as.matrix(get.adjacency(erdos.renyi.game(100, 0.4)))
## Loading required package: Matrix
## Loading required package: lattice
In this example, we want to match ( A ) with a graph that has edge correlation ( \rho=0.7 ). We call the adjacency matrix of this second graph ( B ). We now create ( B ) and in doing so permute the labels of ( B ) with a permutation perm
I wrote the auxiliary function adjcorr to do this
adjcorr <- function(A, P, corr, permutation) {
# input is A modelled from a random binomial graph with A_{i,j} distributed
# Bin(P_{i,j}) for example, if P=.5*matrix(1,n,n) then A is ER(n,0.5) output
# is B which is adjacency matrix with correlation corr element-wise to A the
# labels of B are then permuted via permutation
Q <- P + corr * (1 - P)
n <- nrow(A)
B <- A
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
if (A[i, j] == 1 && runif(1) > Q[i, j]) {
B[i, j] <- 0
B[j, i] <- 0
} else if (A[i, j] == 0 && runif(1) < ((1 - Q[i, j]) * (P[i, j]/(1 -
P[i, j])))) {
B[i, j] <- 1
B[j, i] <- 1
}
}
}
P <- diag(n)
P <- P[permutation, ]
B <- P %*% B %*% t(P)
return(B)
}
Now to make ( B ), keeping the first ( m=20 ) labels unpermuted
perm <- matrix(sample(80))
x <- matrix(1:20)
perm <- rbind(x, perm)
P <- 0.4 * matrix(1, 100, 100)
# P is the matrix with edge probabilities for A
B <- adjcorr(A, P, 0.7, perm)
dim(B)
## [1] 100 100
Now to match A and B. First the code:
sgm <- function(A, B, m, start, iteration) {
# seeds are assumed to be vertices 1:m in both graphs
totv <- ncol(A)
n <- totv - m
A12 <- A[1:m, (m + 1):(m + n)]
A21 <- as.matrix(A[(m + 1):(m + n), 1:m])
A22 <- A[(m + 1):(m + n), (m + 1):(m + n)]
B12 <- B[1:m, (m + 1):(m + n)]
B21 <- as.matrix(B[(m + 1):(m + n), 1:m])
B22 <- B[(m + 1):(m + n), (m + 1):(m + n)]
patience <- iteration
tol <- 0.99
P <- start
toggle <- 1
iter <- 0
while (toggle == 1 & iter < patience) {
iter <- iter + 1
Grad <- 2 * A22 %*% P %*% t(B22) + 2 * A21 %*% t(B21)
ind <- matrix(solve_LSAP(Grad, maximum = TRUE))
T <- diag(n)
T <- T[ind, ]
c <- sum(diag(t(A22) %*% P %*% B22 %*% t(P)))
d <- sum(diag(t(A22) %*% T %*% B22 %*% t(P))) + sum(diag(t(A22) %*%
P %*% B22 %*% t(T)))
e <- sum(diag(t(A22) %*% T %*% B22 %*% t(T)))
u <- 2 * sum(diag(t(P) %*% A21 %*% t(B21)))
v <- 2 * sum(diag(t(T) %*% A21 %*% t(B21)))
if (c - d + e == 0 && d - 2 * e + u - v == 0) {
alpha <- 0
} else {
alpha <- -(d - 2 * e + u - v)/(2 * (c - d + e))
}
f0 <- 0
f1 <- c - e + u - v
falpha <- (c - d + e) * alpha^2 + (d - 2 * e + u - v) * alpha
if (alpha < tol && alpha > 0 && falpha > f0 && falpha > f1) {
P <- alpha * P + (1 - alpha) * T
} else if (f0 > f1) {
P <- T
} else {
toggle <- 0
}
}
corr <- matrix(solve_LSAP(P, maximum = TRUE))
corr <- cbind(matrix((m + 1):totv, n), matrix(m + corr, n))
return(corr)
}
Now the matching (we begin at the barycenter in ( \mathbb{R}^{80\times 80} ))
start <- (1/80) * matrix(1, 80, 80)
match <- sgm(A, B, 20, start, 25)
# matching A and B with 20 seeds started in 'start' with 25 iteration
# allowed in our F-W routine
match
## [,1] [,2]
## [1,] 21 29
## [2,] 22 53
## [3,] 23 23
## [4,] 24 82
## [5,] 25 52
## [6,] 26 45
## [7,] 27 95
## [8,] 28 30
## [9,] 29 39
## [10,] 30 89
## [11,] 31 75
## [12,] 32 24
## [13,] 33 73
## [14,] 34 43
## [15,] 35 86
## [16,] 36 100
## [17,] 37 40
## [18,] 38 88
## [19,] 39 93
## [20,] 40 37
## [21,] 41 22
## [22,] 42 54
## [23,] 43 79
## [24,] 44 92
## [25,] 45 25
## [26,] 46 57
## [27,] 47 44
## [28,] 48 97
## [29,] 49 70
## [30,] 50 31
## [31,] 51 42
## [32,] 52 48
## [33,] 53 69
## [34,] 54 72
## [35,] 55 74
## [36,] 56 90
## [37,] 57 62
## [38,] 58 71
## [39,] 59 80
## [40,] 60 94
## [41,] 61 99
## [42,] 62 27
## [43,] 63 76
## [44,] 64 67
## [45,] 65 83
## [46,] 66 65
## [47,] 67 96
## [48,] 68 58
## [49,] 69 87
## [50,] 70 32
## [51,] 71 50
## [52,] 72 41
## [53,] 73 85
## [54,] 74 59
## [55,] 75 77
## [56,] 76 84
## [57,] 77 63
## [58,] 78 78
## [59,] 79 26
## [60,] 80 91
## [61,] 81 81
## [62,] 82 66
## [63,] 83 21
## [64,] 84 64
## [65,] 85 68
## [66,] 86 36
## [67,] 87 55
## [68,] 88 33
## [69,] 89 51
## [70,] 90 46
## [71,] 91 34
## [72,] 92 61
## [73,] 93 35
## [74,] 94 49
## [75,] 95 56
## [76,] 96 98
## [77,] 97 47
## [78,] 98 28
## [79,] 99 60
## [80,] 100 38
match is interpreted as follows: look at row 1 of match
match[1, ]
## [1] 21 29
# this says that vertex
match[1, 1]
## [1] 21
# in B is matched to vertex
match[1, 2]
## [1] 29
# in A
Hi Gabor et al. Here is the R-code to run SGM (or at least a uber-inefficient version (though I think correct) I threw together myself). How do you propose that we discuss how to move forward with this?
Code to match graphs A and B using m seeds--assumes seeds are vertices 1:m in both A and B. Also assumes that A and B are correctly aligned to begin with--outputs number of correct matches. The matching could easily be outputted as well.
Creates rho-correlated ER(n,p) random graphs with true alignment given by a permutation (labelled permutation in the input).
Runs iter number of iterations of SGM on two rho correlated ER(n,p) graphs with m randomly chosen seeds
Cheers, Vince