igraph / igraph

Library for the analysis of networks
https://igraph.org
GNU General Public License v2.0
1.75k stars 405 forks source link

degree_sequence_game: DEGSEQ_VL (Viger-Latapy) does not sample uniformly #1073

Open szhorvat opened 6 years ago

szhorvat commented 6 years ago

The DEGSEQ_VL method of degree_sequence_game does not sample uniformly.

Here's an example. Let's take the degree sequence {3, 2, 2, 2, 3}. The possible realizations are the following:

image

The 1st one is unique, and the other 6 are isomorphic with each other. The problem: the first one is sampled more frequently than the rest.

Table[
 Table[IGDegreeSequenceGame[{3, 2, 2, 2, 3}, 
      Method -> "VigerLatapy"], {10000}] // 
    CountsBy[AdjacencyMatrix] // KeySort // Values,
 {5}
 ]
{{1771, 1362, 1372, 1396, 1329, 1383, 1387}, 
 {1777, 1385, 1377, 1425, 1345, 1380, 1311}, 
 {1764, 1496, 1393, 1319, 1352, 1343, 1333}, 
 {1731, 1339, 1370, 1426, 1387, 1389, 1358}, 
 {1726, 1400, 1378, 1346, 1370, 1342, 1438}}

These are the number of times each graph from the above list occurred in 10000 samples. As you can see, the first one is consistently more frequent.

This might be because of how this method works: it rewires the graph several times to randomize it. What is unclear is how many times the rewiring needs to be done, and how uniform the result will be (never perfect).

I think it's essential that there should be a parameter controlling the number of rewiring trials.

Perhaps this method is better to be made part of igraph_rewire, or an entirely separate function. Maybe degree_sequence_game could be split into two functions: one would deal with methods based on the configuration model (SIMPLE and SIMPLE_NO_MULTIPLE) and the other would deal with rewiring-based methods. Alternatively, the rewiring-methods could have two steps: 1. construction (e.g Havel-Hakimi, or that followed by a procedure to make the graph connected) 2. rewiring (i.e. the existing igraph_rewire extended with multiple methods)

szhorvat commented 6 years ago

Related: bugs #1062, #1071, #1072

ntamas commented 6 years ago

The DEGSEQ_VL method is based on the original implementation of Viger and Latapy, and I presume that it follows their paper. From the paper, it seems like they claim that doing O(m) rewiring steps is enough (see refs 12 and 19 in their paper). This claim is probably not a rigorous mathematical proof but it is "good enough" for real use-cases with large graphs.

I think that the current igraph_degree_sequence_game() function is also "good enough" for everyday use-cases if the graph is large enough. For advanced use-cases (e.g., smaller graphs, or strict criteria on the uniformity of the distribution), we could implement a separate igraph_havel_hakimi_graph() function, which could then be combined with igraph_rewire(), and an igraph_viger_latapy_game() function that uses a pre-defined number of rewiring steps for the Viger-Latapy method.

Unfortunately I don't have time to implement any of these in the near future, but PRs are always welcome.

szhorvat commented 6 years ago

The PR is not a problem. The issue is that a reasonable fix to these problems would involve an API change (e.g. like what you suggested) and I wasn't sure what would be acceptable.

I am getting more and more skeptical about the VL method, especially the claim that there's no need to worry about the number of rewiring steps. To my knowledge, there are next to no hard results about convergence rates for such methods. The number of steps should really be user-controllable.

ntamas commented 6 years ago

The issue is that a reasonable fix to these problems would involve an API change

I am not particularly concerned about API changes at this stage; I don't know what plans @gaborcsardi has with igraph in the near future, but considering how much time has passed since the last release, the next one would probably be a major one anyway, so some API breakage here and there does not matter.

To my knowledge, there are next to no hard results about convergence rates for such methods.

I am not aware of any similar results either (at least not ones backed by the level of academic rigor that you commonly see in the mathematics community), but I haven't been following the developments in this field for the last few years so I might be wrong. However, I still think that "average" users (especially those who come to igraph from other fields) generally expect the library to have some kind of a "here is this degree sequence, give me a graph, no questions asked" type of function, and the VL method lived up to their expectations so far. We can remove the claim about the uniform sampling in the documentation to clear things up, and then users for whom the uniformity of the sampling is important can then use the method you suggested in some other issue here (if it gets implemented). For "everyday" use-cases, the VL method is probably fine and probably more efficient than the naive methods that we have, especially when connectedness is important. (As far as I know, the VL method uses some kind of a scheme where connectedness is checked only every N steps, and it can "rewind" the last N steps if it turns out that the connectedness property was violated).

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.