open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

affinity stochastic blockmodels #211

Open edunnwe1 opened 9 years ago

edunnwe1 commented 9 years ago

Can anyone recommend a good resource on affinity stochastic block models? So far I've found Joshua's paper http://arxiv.org/pdf/1211.3601.pdf which gives an introduction, but it'd be great to learn a bit more about them. Thanks!

whock commented 9 years ago

I don't know of any other good intros either but I think one way of thinking about affinity SBMs (SBMa) is by considering what type of graph it would lead to. From Joshua's paper:

"An affinity stochastic blockmodel is a stochastic blockmodel wherein the probabilities of connection within a block is in general larger than those between blocks. In an affinity stochastic blockmodel,a community is then a collection of nodes whose connections are more dense within the community, as compared to connections between that community and other nodes." (Priebe et al., 2014)

So this leads me to think that an SBMa would have 'neighborhoods' or 'hub's or whatever you want to call them based around each block. This comes from the fact that nodes within a block have an affinity for one another compared to nodes from other blocks. Then maybe we can gain insight into an SBMa by thinking about dense subgraphs (i.e. densely connected blocks). Here are a couple papers on dense subgraphs that may help understand how to identify them and how they relate to the overall nature of the graph;

(Khuller and Saha) https://people.cs.umass.edu/~barna/paper/icalp-final.pdf

Charikar: http://www.cs.princeton.edu/~zdvir/apx11slides/charikar-slides.pdf

edunnwe1 commented 9 years ago

Hey thanks! It'd definitely be interesting to think about when it makes sense to use an affinity stochastic block model and this may offer a lens for that. But while it's true that if we cluster the adjacency matrix of a graph into an affinity stochastic block model (or SBMa to use your terminology) that the probability of an edge will be higher within a block than between blocks, and so connections will be more dense within a block than across blocks, I don't think that necessarily means that there are dense subgraphs. First, because I don't think that a cluster or block can always be defined as a subgraph, I think a subgraph has to be closed/disconnected? which we don't always have. And second and more importantly, it's just that the probability of an edge has to be higher within a cluster than between clusters. So I don't think that we necessarily need a hub around particular nodes. For an extreme example say you cluster on graphs with four disconnected components and you create four clusters for each of those components. Then the estimate of the probability of an edge within a cluster is higher than across clusters regardless of how objectively dense the connections are within a cluster. I think that a 'relative density' metric may as well be just be whether the probability of a connection within a block is greater than the probability of an edge across blocks, and actually measuring how dense these blocks are may not be too useful?

Also, I think a lot of this is to say that I think that the property of the SBMa that the probability of an edge is higher within a block than outside of a block is all to say that this is a property that arises more from the way that you create the blocks than from just the graph itself, right?

*note: cluster and block used interchangeably here