open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

variability across graphs for hubs #218

Open ghost opened 9 years ago

ghost commented 9 years ago

From what I recall from class, if an Erdos-Renyi graph is supposed to have edges with equal probability p, the number of edges for each node should follow closely to a Gaussian distribution. Then taking the nodes two or so standard deviations above mean would make sense.

If this is our working definition, hubs across different networks would be based on a different number of edges. However, if we change the definition of a hub to be defined as any node with at least x number of edges, it doesn't take into account the distribution of nodes in each graph. Would we then just pick definition based on the situation? If so, what definition do you think would fit better for the paper that was presented on diseases?

whock commented 9 years ago

My sense is that you can use whatever definition you'd like as long as you are consistent within an analysis, but I'm curious when you'd want to use the latter definition of "degree(node) >= x". I think that taking into account the distribution of edges on the graph, as you point out, is important.

But both definitions of hub you give use number / distribution of degrees as a key part of their definition. This of course makes sense but I think we can expand our conception of hubs and admit other definitions. In general, it seems like a hub is a node with high centrality where centrality means roughly its importance or influence [1]. The centrality discussed so far has been degree centrality.

Some other measures of centrality that could be argued to measure the "hubness" of a node are:

1) Katz Centrality [2]: A measure of all nodes that are indirectly connected to the node in question, with a penalty function applied for more distant nodes. It is a generic version of degree centrality. (Google's search algorithm uses some version of Katz centrality to compute its PageRank score). A related measure is Eigenvalue centrality which similarly assigns 'importance' scores to each node based on connections.

2) Betweenness centrality [3]: A measure of all geodesic paths between all nodes that pass through the node in question. It is a measure of to what extent the node in question bridges different nodes and links graph regions together.

3) For dynamic graphs, especially wrt epidemiology, percolation centrality can be useful [4]: This is a measure defined for each node at each timestep of how many paths traverse said node where the walk begins at an "active" node. A node is active if it, .e.g. represents an infected person or is positive for whatever dynamic variable is being considered (in general if it is "percolated"). Someone who facilitates transmission of a disease to some significant degree could be considered a hub in that context and would have a high percolation centrality score.

[1] http://cs.brynmawr.edu/Courses/cs380/spring2013/section02/slides/05_Centrality.pdf [2] Katz, L. 1953. A New Status Index Derived from Sociometric Index. Psychometrika, 39–43. [3] Freeman, Linton (1977). "A set of measures of centrality based on betweenness". Sociometry 40: 35–41. doi:10.2307/3033543. [4] Piraveenan, Mahendra (2013). "Percolation Centrality: Quantifying Graph-Theoretic Impact of Nodes during Percolation in Networks". PLoSone 8 (1). doi:10.1371/journal.pone.0053095.