slackner / anti-community

Anti-community Detection
2 stars 1 forks source link

ELI5 Definition of Anti-Community #1

Open BradKML opened 2 years ago

BradKML commented 2 years ago
  1. Is it possible to conjecture that anti-communities are structures that emulates multi-partite structures, and minimizes in-group connections?
  2. Is it possible to define it in terms of in-group density minimization relative to out-group, or something similar to normalized cut or conductance?
  3. can this be tied to CDLib?
slackner commented 2 years ago

Hello,

glad to see that others are also looking at the subject of anti-communities. :) To answer your questions:

  1. Yes, anti-communities are basically multi-partite structures, but less strict. Multi-partite graphs have the requirement that there are no in-group connections. For anti-communities this condition is relaxed - there might still be a few in-group connections, but most connections are between groups.

  2. Your intuition that this is related to the density difference is correct. However, there is not just one way to define this mathematically. Similar to community detection, there are lots of different ways and measures that could be used, each with slightly different results.

To see why this is the case, note that many community detection algorithms can be modified to make them work for anti-communities, just by pre-processing the input: Instead of passing the original graph to the algorithm, the so-called graph complement is computed and passed instead. The graph complement can be computed by removing all original edges, and adding edges between vertices that were previously not connected. This basically swaps the densities and makes regular community detection algorithms work again (assuming they are able to detect sufficiently small density differences).

This shows that you can basically use any classical community detection algorithm / method for anti-communities, and there are really a lot of them, as you can see in this survey, for example: https://arxiv.org/pdf/0906.0612.pdf

One way to define "density" is the so-called modularity measure that is also used in my paper, but I'm pretty sure that there are also other measures / algorithms that could be used, like min-cut or max-cut.

  1. Directly integrating the source code of this repository to CDLib probably would require a rewrite, it looks like all algorithms contained in CDLib only contain pure Python code. Doing such a rewrite is certainly possible, but might require a bit of effort.

If your goal is to just use them for a few evaluations, there are two ways:

If you are planning to re-implement certain algorithms, feel free to ask if you have any question. Most algorithms should be relatively straight-forward, the main difficulty is getting heuristic modularity-based algorithms run fast. If performance is not a concern, most of the complexity introduced in my paper can be skipped.

Best regards, Sebastian