Closed gaborcsardi closed 10 years ago
The relevant papers seem to be:
Seeded graph matching for correlated Erdos-Renyi graphs, by Vince Lyzinski, Donniell E. Fishkind, Carey E. Priebe
Seeded graph matching for large stochastic block model graphs, by Vince Lyzinski, Daniel L. Sussman, Donniell E. Fishkind, Henry Pao, Carey E. Priebe
An easy way to speed up the generation function is to do geometric sampling for the edges that are realized in A and that are not, separately. This is super easy.
Btw. one thing I don't understand is that the generation function only returns graph B, without returning A, and A is not an input argument either. So all in all it is just a fancy way of creating an ER graph. How do you actually generate a pair of graphs?
From Vince:
Hi Gabor, Sorry for the tardy reply. Yes, those papers are the relevant ones for these algorithms. For the generation function, I agree, it should return both A and B (I think I had code that did this--and it seems to be lost in the ether :-) ). Ideally, A would be an input and the generation function would generate a rho-correlated B. Something like:
CER<-function( A, p, rho ,permutation){
for( i in 1:(n-1)){
for(j in (i+1):n){
if(A[i,j]==1 && runif(1)>q){
}else if(A[i,j]==0 && runif(1)< ((1-q)*( p/(1-p) ))){
would do it. I think (later on), I'd like to generalize this to correlated random Bernoulli graphs (but that is a song for another day).
Would you like the R code for an example run? Let me know, and I'll get it to you as soon as possible.
this one: describes the non-seeded algorithm, which is just a special case of the seeded one, and might be helpful.
@vince: I might be missing something, but isn't an ER G(n,p) exactly the same as a random Bernoulli graph?
This is code for implementing our Seeded Graph Matching algorithm. Our algorithm takes two adjacency matrices, ( A ) and ( B ), and a seeding function ( m ) (we assume that the first ( m ) vertices of each graph are the seeded vertices) and outputs a matching of the unseeded vertices.
For example, suppose that ( A ) is an Erdos-Renyi graph with 100 vertices and ( p=0.4 ).
## Loading required package: igraph
## Loading required package: clue
A <- as.matrix(get.adjacency(, 0.4)))
## Loading required package: Matrix
## Loading required package: lattice
In this example, we want to match ( A ) with a graph that has edge correlation ( \rho=0.7 ). We call the adjacency matrix of this second graph ( B ). We now create ( B ) and in doing so permute the labels of ( B ) with a permutation perm
I wrote the auxiliary function adjcorr to do this
adjcorr <- function(A, P, corr, permutation) {
# input is A modelled from a random binomial graph with A_{i,j} distributed
# Bin(P_{i,j}) for example, if P=.5*matrix(1,n,n) then A is ER(n,0.5) output
# is B which is adjacency matrix with correlation corr element-wise to A the
# labels of B are then permuted via permutation
Q <- P + corr * (1 - P)
n <- nrow(A)
B <- A
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
if (A[i, j] == 1 && runif(1) > Q[i, j]) {
B[i, j] <- 0
B[j, i] <- 0
} else if (A[i, j] == 0 && runif(1) < ((1 - Q[i, j]) * (P[i, j]/(1 -
P[i, j])))) {
B[i, j] <- 1
B[j, i] <- 1
P <- diag(n)
P <- P[permutation, ]
B <- P %*% B %*% t(P)
Now to make ( B ), keeping the first ( m=20 ) labels unpermuted
perm <- matrix(sample(80))
x <- matrix(1:20)
perm <- rbind(x, perm)
P <- 0.4 * matrix(1, 100, 100)
# P is the matrix with edge probabilities for A
B <- adjcorr(A, P, 0.7, perm)
## [1] 100 100
Now to match A and B. First the code:
sgm <- function(A, B, m, start, iteration) {
# seeds are assumed to be vertices 1:m in both graphs
totv <- ncol(A)
n <- totv - m
A12 <- A[1:m, (m + 1):(m + n)]
A21 <- as.matrix(A[(m + 1):(m + n), 1:m])
A22 <- A[(m + 1):(m + n), (m + 1):(m + n)]
B12 <- B[1:m, (m + 1):(m + n)]
B21 <- as.matrix(B[(m + 1):(m + n), 1:m])
B22 <- B[(m + 1):(m + n), (m + 1):(m + n)]
patience <- iteration
tol <- 0.99
P <- start
toggle <- 1
iter <- 0
while (toggle == 1 & iter < patience) {
iter <- iter + 1
Grad <- 2 * A22 %*% P %*% t(B22) + 2 * A21 %*% t(B21)
ind <- matrix(solve_LSAP(Grad, maximum = TRUE))
T <- diag(n)
T <- T[ind, ]
c <- sum(diag(t(A22) %*% P %*% B22 %*% t(P)))
d <- sum(diag(t(A22) %*% T %*% B22 %*% t(P))) + sum(diag(t(A22) %*%
P %*% B22 %*% t(T)))
e <- sum(diag(t(A22) %*% T %*% B22 %*% t(T)))
u <- 2 * sum(diag(t(P) %*% A21 %*% t(B21)))
v <- 2 * sum(diag(t(T) %*% A21 %*% t(B21)))
if (c - d + e == 0 && d - 2 * e + u - v == 0) {
alpha <- 0
} else {
alpha <- -(d - 2 * e + u - v)/(2 * (c - d + e))
f0 <- 0
f1 <- c - e + u - v
falpha <- (c - d + e) * alpha^2 + (d - 2 * e + u - v) * alpha
if (alpha < tol && alpha > 0 && falpha > f0 && falpha > f1) {
P <- alpha * P + (1 - alpha) * T
} else if (f0 > f1) {
P <- T
} else {
toggle <- 0
corr <- matrix(solve_LSAP(P, maximum = TRUE))
corr <- cbind(matrix((m + 1):totv, n), matrix(m + corr, n))
Now the matching (we begin at the barycenter in ( \mathbb{R}^{80\times 80} ))
start <- (1/80) * matrix(1, 80, 80)
match <- sgm(A, B, 20, start, 25)
# matching A and B with 20 seeds started in 'start' with 25 iteration
# allowed in our F-W routine
## [,1] [,2]
## [1,] 21 29
## [2,] 22 53
## [3,] 23 23
## [4,] 24 82
## [5,] 25 52
## [6,] 26 45
## [7,] 27 95
## [8,] 28 30
## [9,] 29 39
## [10,] 30 89
## [11,] 31 75
## [12,] 32 24
## [13,] 33 73
## [14,] 34 43
## [15,] 35 86
## [16,] 36 100
## [17,] 37 40
## [18,] 38 88
## [19,] 39 93
## [20,] 40 37
## [21,] 41 22
## [22,] 42 54
## [23,] 43 79
## [24,] 44 92
## [25,] 45 25
## [26,] 46 57
## [27,] 47 44
## [28,] 48 97
## [29,] 49 70
## [30,] 50 31
## [31,] 51 42
## [32,] 52 48
## [33,] 53 69
## [34,] 54 72
## [35,] 55 74
## [36,] 56 90
## [37,] 57 62
## [38,] 58 71
## [39,] 59 80
## [40,] 60 94
## [41,] 61 99
## [42,] 62 27
## [43,] 63 76
## [44,] 64 67
## [45,] 65 83
## [46,] 66 65
## [47,] 67 96
## [48,] 68 58
## [49,] 69 87
## [50,] 70 32
## [51,] 71 50
## [52,] 72 41
## [53,] 73 85
## [54,] 74 59
## [55,] 75 77
## [56,] 76 84
## [57,] 77 63
## [58,] 78 78
## [59,] 79 26
## [60,] 80 91
## [61,] 81 81
## [62,] 82 66
## [63,] 83 21
## [64,] 84 64
## [65,] 85 68
## [66,] 86 36
## [67,] 87 55
## [68,] 88 33
## [69,] 89 51
## [70,] 90 46
## [71,] 91 34
## [72,] 92 61
## [73,] 93 35
## [74,] 94 49
## [75,] 95 56
## [76,] 96 98
## [77,] 97 47
## [78,] 98 28
## [79,] 99 60
## [80,] 100 38
match is interpreted as follows: look at row 1 of match
match[1, ]
## [1] 21 29
# this says that vertex
match[1, 1]
## [1] 21
# in B is matched to vertex
match[1, 2]
## [1] 29
# in A
Hi Gabor et al. Here is the R-code to run SGM (or at least a uber-inefficient version (though I think correct) I threw together myself). How do you propose that we discuss how to move forward with this?
Code to match graphs A and B using m seeds--assumes seeds are vertices 1:m in both A and B. Also assumes that A and B are correctly aligned to begin with--outputs number of correct matches. The matching could easily be outputted as well.
Creates rho-correlated ER(n,p) random graphs with true alignment given by a permutation (labelled permutation in the input).
Runs iter number of iterations of SGM on two rho correlated ER(n,p) graphs with m randomly chosen seeds
Cheers, Vince