skojaku / core-periphery-detection

Python package for detecting core-periphery structure in networks.
Apache License 2.0
47 stars 5 forks source link

KM_Config tuning ? #15

Closed JulienOx closed 3 years ago

JulienOx commented 3 years ago

Hello, me again !

This isn't really an issue, but more of a question.

I've been playing around a lot with the KM_Condig Algo, and so far, every results I get has the property that there is 0 intra periphery connections. I see this by drawing the adjacency matrix re-arranged by the Algo. I get similar results than you for most of the emperical networks in your 2018 paper, but then I get a problem for the Blog network about the political elections.

In your paper, you show in figure 14 the new adjacency matrix of the blog network, but this time there are intra-periphery connections ! When I run the KM_Config, I do not get any such connections. I do however get very similar results, with the two core-periph pairs, and the same colouring of the nodes ( BTW, how do you get the nice color bar next to the graph ? I just managed to verify that my colors do match, but not reproduce your figure).

So I was wondering if there is some tuning parameter in the source code to allow more or less intra-periph connections ? I first had problems with this when I tried the KM_config on an SBM generated graph in networkx, and the results were quite uggly, as it would squish the graph to not have intra-periph connections, despite the very obvious core-periph structure of the model. ( I generated a graph with 2 core-periphery pairs).

I did look at the sourcecode for KM_config, but couldn't really decypher it !

Thank you for the help :)

skojaku commented 3 years ago

Thanks for the follow-up! I am writing my response to your question one-by-one in the following:

In your paper, you show in figure 14 the new adjacency matrix of the blog network, but this time there are intra-periphery connections ! When I run the KM_Config, I do not get any such connections. I do however get very similar results, with the two core-periph pairs, and the same colouring of the nodes.

Thanks for reading our paper! The algorithm does not ensure that the intra-periphery connections to be zero by design. Instead, the algorithm tries to minimize the intra-periphery connections. The reason why you find a different result is that the algorithm has stochasticity and thus may find a different core-periphery structure every time you run the algorithm. I did not modify the algorithm itself (although I rewrote it using numba after the paper published) and thus I believe the difference is due to the inherent stochasticity of the algorithm.

( BTW, how do you get the nice color bar next to the graph ? I just managed to verify that my colors do match, but not reproduce your figure).

I created this graph using gnuplot! Looking back, it was not a good way to plot the figure because I needed to write a lot of code. I believe matplotlib would be enough to draw the figure by placing rectangles on the left and top. Alternatively, observable or d3js would be good to go.

So I was wondering if there is some tuning parameter in the source code to allow more or less intra-periph connections ? I first had problems with this when I tried the KM_config on an SBM generated graph in networkx, and the results were quite uggly, as it would squish the graph to not have intra-periph connections, despite the very obvious core-periph structure of the model. ( I generated a graph with 2 core-periphery pairs). There is no turning parameter for KM_config algorithm except the resolution parameter though I did not much explore this direction. So I intentionally hide it in the code.

You can set the resolution parameter by

model = cpnet.KM_config()
model.alpha = 0.1 # resolution parameter 

where the alpha ranges between 0 and 1 and controls the resolution (defaults to 0.5; the 2018 paper also set alpha=0.5).

By setting a small alpha, the algorithm will find a larger core-periphery structure and vice versa. Again, I haven't explored this direction much and thus can say much about its impact on the final results.

JulienOx commented 3 years ago

Hello,

Coming back to the first point, I understand there is some stochasticity, but every time I run the aglo, I get the same results ! I don't know if what I said was clear enough, so here is some results I got : image image

The first image is the matrix of an SBM generated graph with te following code : sizes = [40, 40, 60,70] probs = [[0.5, 0.4, 0.02,0.02], [0.4, 0.06, 0.02,0.02], [0.02, 0.02, 0.4,0.3],[0.02, 0.02, 0.3,0.06]] G = nx.stochastic_block_model(sizes, probs, seed=0)

I then ran the KM_config on this, and rearranged the matrix, and re-arranged it accordingly, showing with the red dashed line where the separation of core-perih are. You see that the result is clearly wrong ? And that it seems to only make it so the intra-periph connections are empty ? There are a very few intra-periph connections in the seconde pairs( 26, so 13 since undirected), but again it seems like the algorithm is heavily skewed that way ?

Here I added the results for the blog network just to illustrate the difference with the result from your paper : image

I'm not sure why it does so, maybe this is just limitation of the algorithm, or perhaps some error in my code ?

skojaku commented 3 years ago

This is a great test! This is not an error, I believe. But let me take a look at the code to double-check.

I want to mention that it is not wrong even if the algorithm did not find blocks in the stochastic block model because block is one possible form of core-periphery structure among many kinds of forms, and each algorithm is developed to find a different structure. In fact, many algorithms do not find the blocks of SBM just like the KM_config because they are developed based on different philosophies and purposes. A good demonstration of this is by Karrer and Newman's paper in PRE (https://arxiv.org/pdf/1008.3926.pdf), where the SBM and dcSBM found very different structure but both make sense (fig. 2). So there is no such thing as ground-truth for core-periphery structure with which we can say "correct" or "wrong".

In fact, there are many ways to define the core-periphery structure (see Puck Rombach, Mason A. Porter, James H. Fowler, and Peter J. Mucha, SIAM paper https://doi.org/10.1137/17M1130046). Again, you may see it wrong but it can be right from a different perspective because there is no ground truth based on which we can say right or wrong. For example, in this detected structure, the core is dense while the periphery is sparse, which agrees with the notion of core-periphery structure. So, at least conceptionally, it makes sense. But it is difficult to say "right" or "wrong".

One thing I am wondering is that the output of the algorithm is exactly the same? I believe this could not happen but if it does, I'd like to investigate further and appreciate any information or log!

JulienOx commented 3 years ago

Hello,

Ok, so I understand why it doesn't work for the SBM, simply because the definitions. Thank you.

I still have issues reproducing your results for the blog network.

First, running the significance test with the graph ( and I used your code from example 3 to make sure they are the same), I get 11 core-periph pairs, 5 of them are deemed " significant by the test" :

kmconfig = cpnet.KM_config() # Call the BE algorithm kmconfig.detect(H) # Detect core-periphery structures c = kmconfig.get_pair_id() # Get the group membership of nodes x = kmconfig.get_coreness() sig_c, sig_x, significant, p_values = cpnet.qstest( c, x, H, kmconfig, significance_level=0.05, num_of_thread=16, ) print(significant) [False, True, False, False, True, False, False, True, True, True, False]

Some of those pairs are very small ( like 4 elements in total), and I only find 12 residual nodes.

One thing I am wondering is that the output of the algorithm is exactly the same? I believe this could not happen but if it does, I'd like to investigate further and appreciate any information or log!

I was wrong to say " I get the same result", as in the results changes slightly with the stochastic nature of the algorithm, but I get very similar results ( as expected), all of them having the property of having almost no intra-periphery connections.

I don't think this will be a big issue when I apply the algorithm to some new data, it's just I was wondering if maybe there was a mistake since I couldn't reproduce the results from your paper ( the Blog network result mostly), and the fact that it didn't work on SBM got me worried !

Just to reiterate, my main issue is that I can't get a result like the one in your paper, figure 14, Where you seem to have quite a few intra-periph connections.

skojaku commented 3 years ago

Thanks! Let me run your code and investigate. I will probably be able to get back to you on Friday or the next Monday.

JulienOx commented 3 years ago

Thank you !

JulienOx commented 3 years ago

Hello, any news please ?

skojaku commented 3 years ago

Haven't started yet :sweat_smile:

skojaku commented 3 years ago

See the pull request! In sum, no issue was found and I could reproduce the results as reported in Sadamori Kojaku and Naoki Masuda 2018 New J. Phys. 20 043012

JulienOx commented 3 years ago

Great thank you ! I'll have a look at it

JulienOx commented 3 years ago

Hello,

Everything seems good, I think I just had trouble understanding fully how the algorithm works !

Thank you for all the work

skojaku commented 3 years ago

I very much appreciate your feedback and raising the issue! Thanks!