puzzlef / leiden-communities-openmp-dynamic

Design of OpenMP-based Dynamic Leiden algorithm for community detection.
https://arxiv.org/abs/2405.11658
MIT License
0 stars 0 forks source link

Still some disconnected communties #4

Closed wolfram77 closed 2 weeks ago

wolfram77 commented 2 weeks ago

Some disconnected communities can appear in the input due to edge deletions after applying the batch update (upon the graph). When the local-moving phase of the Leiden algorithm is run, some new disconnected communities may be introduced. However, the refinement phase should be able to split the disconnected communities. This is, however, not happening in some cases.

See below screenshot, for example. image

Note that the disconnected are essentially random, and possibly different from the disconnected communities that were in the input (we do a subset renaming, so community IDs change).

Lets see what happens if we refine all communities instead.

wolfram77 commented 2 weeks ago

There does not seem to be any issue with the logic for refinement phase.

Now I notice that RE-1, which indicates the step after placing all vertices into isolated communities. However, see that RE-1 actually has 6 disconnected communities above. This means that the 6 disconnected communities are outside of the communities marked for refinement, and thus cannot be resolved by the refinement phase.

The question now is, why are those disconnected communities not marked for refinement?

wolfram77 commented 2 weeks ago

It seems we must mark the communities touched by the batch update as changed - not surprising. Jai Shri Ganesh!

wolfram77 commented 2 weeks ago

Still seeing some disconnected communities if I don't mark all touched communities.

Now, still seeing some disconnected communities, even with Static Leiden. Need to check. Is it a regression?

image

wolfram77 commented 2 weeks ago

I replaced parallel refine with a sequential one, still seeing disconnected communities!

image

wolfram77 commented 5 days ago

The code used to debug is as follows.

        if (DYNAMIC && isFirst) {
          size_t DC = countValue(communitiesDisconnectedOmp(x, ucom), char(1));
          size_t C  = communities(x, ucom).size();
          printf("LM-1: %zu / %zu disconnected communities\n", DC, C);
        }

        if (DYNAMIC && isFirst) {
          size_t DC = countValue(communitiesDisconnectedOmp(x, ucom), char(1));
          size_t C  = communities(x, ucom).size();
          printf("LM-2: %zu / %zu disconnected communities\n", DC, C);
        }

          if (DYNAMIC && isFirst) {
            size_t DC = countValue(communitiesDisconnectedOmp(x, ucom), char(1));
            size_t C  = communities(x, ucom).size();
            size_t CC = countValue(cchg, B(1));
            printf("RE-0: %zu / %zu disconnected communities [changed = %zu]\n", DC, C, CC);
          }

          if (DYNAMIC && isFirst) {
            size_t DC = countValue(communitiesDisconnectedOmp(x, ucom), char(1));
            size_t C  = communities(x, ucom).size();
            printf("RE-1: %zu / %zu disconnected communities\n", DC, C);
          }

          if (DYNAMIC && isFirst) {
            size_t DC = countValue(communitiesDisconnectedOmp(x, ucom), char(1));
            size_t C  = communities(x, ucom).size();
            printf("RE-2: %zu / %zu disconnected communities\n", DC, C);
          }