igraph / xdata-igraph

xdata igraph, has been merged into igraph/igraph
GNU General Public License v2.0
18 stars 3 forks source link

SBM revisited (sbm.game) #4

Closed youngser closed 10 years ago

youngser commented 11 years ago

Gabor, First, your "sbm.game" works beautifully!! Thanks a lot!

Attached is a function to generate our special hierarchical (or repeated) sbm. It works fine up to n=10^5, while it fails for n=10^6 with a following error:

Error in [<-(*tmp*, !(B), value = 0.01) : error in evaluating the argument 'i' in selecting a method for function '[<-': Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

Of course, I know it's nothing to do with igraph, rather it's R's limitation. Unfortunately, we have to have n=10^6!

(NB: do you know a solution/trick to bypass it ??)

Meanwhile, as you can see from below, our model is repeated, that is, we don't have to (hope not to) build a huge B matrix ((length(rho)*n/m)^2 dense matrix); they are just a repeat of a smaller C block!

So, I wonder if you can modify "sbm.game" as a special case so that if nrow(pref.matrix) !=n and/or sum(block.sizes) != n, then it reuses "pref.matrix" and so on. For example, if I say

g <- sbm.game(n, pref.matrix=C, block.size=m*rho, off.diag=p)

then, it can generate a graph by repeating 5x5 C diagonal block n/m times! Also note that, I need additional parameter "off.diag" so that I can specify a probability for off-diagonals.

Does this make sense? If not, please just let me know.

Thanks in advance,

## Building hierarchical stochastic block model of graphs
## G = SBM(n,pi,B)
## H = SBM(m,rho,C)
## H is a sub SBM of G, that is,
## B is block diagonal with n/m C's on the diagonal with p elsewhere,
newSBM <- function(n=10^3)
{
    require(Matrix)
    require(igraph)

    set.seed(12345)
    m <- 10^2
    (nB <- n/m)
    rho <- c(0.02,0.5,0.15,0.08,0.25)
    nC <- length(rho)
    p <- 0.01

    ## BxB rate matrix
    C = matrix(c(0.10, 0.07, 0,    0.01, 0,
                 0.02, 0.05, 0.03, 0.02, 0.04,
                 0.03, 0.04, 0.08, 0.08, 0.06,
                 0.37, 0.26, 0.13, 0.29, 0.20,
                 0,    0.02, 0.02, 0.06, 0.09), nC ,nC ,byrow=T)
    C <-  (C + t(C))/2 # symmetrization
    C

    ## build hierarchical blocks
    plist <- rep(list(C), nB)
    B <- bdiag(plist)
    B[!(B)] <- p

    ## block size: sizes of each block
    bs <- rep(m*rho, nB)
    if(length(bs)!=nrow(B)) stop("length(bs) must be the same as nrow(B)")

    ## sbm g
    system.time(g <- sbm.game(n, pref.matrix=B, block.sizes=bs))
    g
}
gaborcsardi commented 11 years ago

The error is not R's limitation. For a 50000 x 50000 dense matrix you need about 20GB memory.

I would rather add a new function, your network is a special case, and it would be confusing to add this behavior to sbm.game(). How do you want to call it? Are the number of hierarchical layers fixed, or that can be arbitrary? The off-diagonal is always a constant?

jovo commented 11 years ago

this is a good question.

perhaps the simplest but somewhat flexible approach would be that the user must specify 2 SBM's, one for the diagonal blocks, and one for the off-diagonal block, and then the number of repeats.

does that make sense?

the reason for suggesting that is that for our brain-graph model, we have that the off-diagonal blocks actually have structure (though we haven't included it yet).

i don't fully understand the naming convention that you are using, but this is a hierarchical sbm, so i would call it something like hsbm.game is guess?

On Tue, Oct 29, 2013 at 10:41 AM, Gabor Csardi notifications@github.comwrote:

The error is not R's limitation. For a 50000 x 50000 dense matrix you need about 20GB memory.

I would rather add a new function, your network is a special case, and it would be confusing to add this behavior to sbm.game(). How do you want to call it? Are the number of hierarchical layers fixed, or that can be arbitrary? The off-diagonal is always a constant?

— Reply to this email directly or view it on GitHubhttps://github.com/gaborcsardi/igraph/issues/4#issuecomment-27308433 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

youngser commented 11 years ago

On Oct 29, 2013, at 10:41 AM, Gabor Csardi notifications@github.com wrote:

The error is not R's limitation. For a 50000 x 50000 dense matrix you need about 20GB memory.

Hmm, we tried it in 0.5TB machine... i thought it's because R doesn't support a long integer??

I would rather add a new function, your network is a special case,

That's fine!

and it would be confusing to add this behavior to sbm.game(). How do you want to call it?

How about "sbm.game.repeated" or "sbm.grames.special" ??

Are the number of hierarchical layers fixed, or that can be arbitrary?

It'll be nice to be flexible. But, the default can be n/sum(block.size). If the value (num.layer) is specified and if num.layer > n/sum(block.size), then the last block should be truncated, else if num.layer < n/sum(block.size), then stops at num.layer else num.layer <- n/sum(block.size)

Does this make sense?

The off-diagonal is always a constant?

I think it can be a constant, right Group?

Any comment/suggestion from the group?? Speak now or never!

Thanks again for your help, Gabor! Cheers,

gaborcsardi commented 11 years ago

R supports long vectors, from version 3.0 I think. Whether Matrix supports it, I don't know.

Hmmm, some details are blurry here. Can one of you give me a prototype header, e.g. something like

\usage{
sbm.game (n, pref.matrix, block.sizes, directed = FALSE, loops = FALSE) 
}
\arguments{
  \item{n}{Number of vertices in the graph.}
  \item{pref.matrix}{The matrix giving the Bernoulli rates.
    This is a \eqn{K\times K}{KxK} matrix, where \eqn{K} is the number
    of groups. The probability of creating an edge between vertices from
    groups \eqn{i} and \eqn{j} is given by element \eqn{(i,j)}. For
    undirected graphs, this matrix must be symmetric.}
  \item{block.sizes}{Numeric vector giving the number of vertices in
    each group. The sum of the vector must match the number of vertices.}
  \item{directed}{Logical scalar, whether to generate a directed
    graph.}
  \item{loops}{Logical scalar, whether self-loops are allowed in the
    graph.}
}

Thanks.

youngser commented 11 years ago

Ok, let's focus with a simpler case first.

How about this?

\usage{
sbm.game.special (n, pref.matrix, block.sizes, p.offdiag, directed = FALSE, loops = FALSE) 
}
\arguments{
  \item{n}{Number of vertices in the graph.}
  \item{pref.matrix}{The matrix giving the Bernoulli rates.
    This is a \eqn{K\times K}{KxK} matrix, where \eqn{K} is the number
    of groups. The probability of creating an edge between vertices from
    groups \eqn{i} and \eqn{j} is given by element \eqn{(i,j)}. For
    undirected graphs, this matrix must be symmetric.}
  \item{block.sizes}{Numeric vector giving the number of vertices in
   each group. If the sum of the vector matches with the number of
   vertices, then this is the same as \code{sbm.game}. If the sum is 
   smaller than \eqn{n}, then the pref.matrix block repeats
   \eqn{n/sum(block.size)} times. If this ratio is not an integer, then 
   the last block will be truncated. }
  \item{p.offdiag}{A probability for the off-diagonal blocks.}
  \item{directed}{Logical scalar, whether to generate a directed
    graph.}
  \item{loops}{Logical scalar, whether self-loops are allowed in the
    graph.}
}

How does this sound? Please feel free to fix my English, and let me know if anything isn't clear. Thanks,

gaborcsardi commented 11 years ago

let's please focus on c^3 model 1 for now -- let's get a graph generated for my n,m,C,rho,p.

What is n, m, C, rho and p here?

jovo commented 11 years ago

n=# vertices m=# vertices within a block rho= # of vertices per cluster within a block C=block probability matrix p=ER probability for all off-diagonal blocks.

make sense?

On Tue, Oct 29, 2013 at 6:32 PM, Gabor Csardi notifications@github.comwrote:

let's please focus on c^3 model 1 for now -- let's get a graph generated for my n,m,C,rho,p.

What is n, m, C, rho and p here?

— Reply to this email directly or view it on GitHubhttps://github.com/gaborcsardi/igraph/issues/4#issuecomment-27350063 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

gaborcsardi commented 11 years ago

So C is an m/rho x m/rho square matrix?

jovo commented 11 years ago

sorry,

rho is a k dimensional vector in the simplex, stating what fraction of vertices are there per cluster within a block.

C is a kxk dimensional matrix (symmetric for now as we are only considering undirected), whose values are between 0 and 1.

n/m is the number of vertices in a block.

the above 3 parameters define the SBM that becomes the diagonal blocks in the bigger model.

does that clarify? i realize that i'm not being so clear, sorry about that....

On Tue, Oct 29, 2013 at 10:55 PM, Gabor Csardi notifications@github.comwrote:

So C is an m/rho x m/rho square matrix?

? Reply to this email directly or view it on GitHubhttps://github.com/gaborcsardi/igraph/issues/4#issuecomment-27361712 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

gaborcsardi commented 11 years ago

Are the number of vertices within blocks deterministic, or rho has probabilities?

jovo commented 11 years ago

for now, to keep things as simple as possible, let's make the number of vertices within blocks deterministic.

On Wed, Oct 30, 2013 at 5:08 PM, Gabor Csardi notifications@github.comwrote:

Are the number of vertices within blocks deterministic, or rho has probabilities?

— Reply to this email directly or view it on GitHubhttps://github.com/gaborcsardi/igraph/issues/4#issuecomment-27439326 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

jovo commented 11 years ago

right: deterministic for now; random later. thanks, carey

On Oct 30, 2013, at 5:34 PM, joshua vogelstein jo.vo@duke.edu wrote:

for now, to keep things as simple as possible, let's make the number of vertices within blocks deterministic.

On Wed, Oct 30, 2013 at 5:08 PM, Gabor Csardi notifications@github.com wrote: Are the number of vertices within blocks deterministic, or rho has probabilities?

— Reply to this email directly or view it on GitHub.

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

gaborcsardi commented 11 years ago

So then what if rho times m is not an integer?

jovo commented 11 years ago

i guess let's just round. maybe also tell them that we have? maybe even tell them the resultant integer values? i'm thinking of the case when somebody specifies rho(i)=0.001 and m=100, they might want to know that cluster i gets zero vertices, deterministically. thanks, j

On Tue, Oct 29, 2013 at 10:22 AM, youngser notifications@github.com wrote:

Gabor, First, your "sbm.game" works beautifully!! Thanks a lot!

Attached is a function to generate our special hierarchical (or repeated) sbm. It works fine up to n=10^5, while it fails for n=10^6 with a following error:

Error in [<-(tmp, !(B), value = 0.01) : error in evaluating the argument 'i' in selecting a method for function '[<-': Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

Of course, I know it's nothing to do with igraph, rather it's R's limitation. Unfortunately, we have to have n=10^6!

(NB: do you know a solution/trick to bypass it ??)

Meanwhile, as you can see from below, our model is repeated, that is, we don't have to (hope not to) build a huge B matrix ((length(rho)*n/m)^2 dense matrix); they are just a repeat of a smaller C block!

So, I wonder if you can modify "sbm.game" as a special case so that if nrow(pref.matrix) !=n and/or sum(block.sizes) != n, then it reuses "pref.matrix" and so on. For example, if I say

g <- sbm.game(n, pref.matrix=C, block.size=m*rho, off.diag=p)

then, it can generate a graph by repeating 5x5 C diagonal block n/m times! Also note that, I need additional parameter "off.diag" so that I can specify a probability for off-diagonals.

Does this make sense? If not, please just let me know.

Thanks in advance,

  • Youngser

Building hierarchical stochastic block model of graphs G = SBM(n,pi,B) H = SBM(m,rho,C) H is a sub SBM of G, that is, B is block diagonal with n/m C's on the diagonal with p elsewhere,

newSBM <- function(n=10^3) { require(Matrix) require(igraph)

set.seed(12345) m <- 10^2 (nB <- n/m) rho <- c(0.02,0.5,0.15,0.08,0.25) nC <- length(rho) p <- 0.01

BxB rate matrix

C = matrix(c(0.10, 0.07, 0, 0.01, 0, 0.02, 0.05, 0.03, 0.02, 0.04, 0.03, 0.04, 0.08, 0.08, 0.06, 0.37, 0.26, 0.13, 0.29, 0.20, 0, 0.02, 0.02, 0.06, 0.09), nC ,nC ,byrow=T) C <- (C + t(C))/2 # symmetrization C

build hierarchical blocks

plist <- rep(list(C), nB) B <- bdiag(plist) B[!(B)] <- p

block size: sizes of each block

bs <- rep(m*rho, nB) if(length(bs)!=nrow(B)) stop("length(bs) must be the same as nrow(B)")

sbm g

system.time(g <- sbm.game(n, pref.matrix=B, block.sizes=bs)) g

}

— Reply to this email directly or view it on GitHubhttps://github.com/gaborcsardi/igraph/issues/4 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

jovo commented 10 years ago

@gabor - upon further consideration, we'd like the code to require that m*rho be a vector of integers (and n/m be an integer too), so, just return an informative error if either of those constraints are not satisfied. thank you.

On Wed, Oct 30, 2013 at 8:25 PM, Gabor Csardi notifications@github.comwrote:

So then what if rho times m is not an integer?

— Reply to this email directly or view it on GitHubhttps://github.com/gaborcsardi/igraph/issues/4#issuecomment-27452043 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

gaborcsardi commented 10 years ago

C docs is missing, and the \details{} from the R manual page, but apart form these I consider this closed.