[FEA]: Add multicut for graph clustering

aabbas90 commented 8 months ago

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

Cugraph should contain a clustering method for which (i) number of clusters are not known apriori, (ii) usage is simple. In this regard a good candidate is multicut (a.k.a correlation clustering) which has applications in various domains ranging for data analysis, community detection, 2D/3D image segmentation etc.

The multicut optimization problem takes a weighted graph as input where positive edge weights correspond to end points preferring to be in the same cluster and viceversa for negative edge weights.

Describe your ideal solution

A function which takes a weighted undirected graph $G = (V, E, c)$ as input and returns vertex IDs $f: V \rightarrow [k]$ where $k \leq |V|$ is the number of clusters determined automatically by the clustering algorithm.

Describe any alternatives you have considered

CPU implementation also exists for example at https://github.com/LPMP/LPMP/blob/master/src/multicut/multicut_greedy_additive_edge_contraction.cpp but they do not scale to very large graphs.

Additional context

I have worked on developing a GPU-friendly solver for this which I can try to integrate in cuGraph. Code: https://github.com/pawelswoboda/RAMA Publication: https://arxiv.org/abs/2109.01838

Code of Conduct

[X] I agree to follow cuGraph's Code of Conduct
[X] I have searched the open feature requests and have found no duplicates for this feature request

ChuckHastings commented 8 months ago

This sounds like a great idea. We have Louvain and Leiden clustering algorithms which meet your criteria (not knowing the number of clusters a priori) and attempt to maximize modularity. However they require a weighted graph where all weights are positive (or an unweighted graph where we create a weighted graph and set all of the initial weights to 1). Adding something like multicut would be a good addition to our software suite.

If you are interested in collaborating on this, we could set up an online meeting and discuss this. We have a library of graph primitive functions that provide the basis for many of our current algorithms. At first glance through your implementation it appears that many of the things you are doing may map well to our existing set of primitives, which would make the implementation pretty straightforward.

aabbas90 commented 8 months ago

I am interested in making this algorithm more reachable to possible users, so yes I am happy to contribute. However I only have knowledge of CUDA and thrust which I have used in my implementation. I was hoping that the code in its current shape with minor modifications could make to cuGraph.

Also some parts of the code are heuristics e.g. computing a matching efficiently without caring too much about optimality, find 3,4, and 5-cycles in a graph following the same principle etc. On the other hand it can also be that using primitives from cuGraph can actually help in making these routines even better but not sure how much development effort would that be.

Edit: For completeness some important subroutines we will need are

Edge contraction (sum costs of duplicate edges).
Connected components.
(Approx.) maximum matching.
For each negatively weighted edge find a path in the graph of length in [2, 4] passing through only positively weighted edges find_small_conflicted_cycles.
A general version of 4 but on comparatively smaller graphs find_large_conflicted_cycles
Communication between edge weights and weights on triangles in a graph message passing

ChuckHastings commented 8 months ago

We should continue to explore this.

The cugraph implementation provides a set of graph primitives that operate on a single GPU and can also operate on a multi-GPU configuration. This allows us to scale to graphs with trillions of edges (with enough GPUs). The primitives take care of the synchronization and communication around the GPU cluster so that the algorithm developer doesn't have to worry about those details. These primitives are "thrust-like", but instead of iterating over the range of a thrust iterator, we iterate over graph structures (iterate over all vertices, iterate over all edges, iterate over edges incident on a vertex, etc). As in thrust, the developer can provide a functor to apply to transform the input (where appropriate on the primitive) and to perform reductions (where appropriate). Within the implementation of the primitives we take care of message passing between GPUs, mirroring data to allow for efficient computation, etc.

Because the primitive interface is "thrust-like", anything you write using thrust is likely to be easily translatable into our primitive structure. I have provided some comments toward your specific list of important subroutines. In areas where you have written CUDA kernels, we would need to explore that a bit more carefully. If we limit ourselves to a single GPU implementation, we can probably use the CUDA kernels that you have with some minor adjustments. But in our multi-GPU implementation, there's no guarantee that the data (when you look for triangles, quadrilaterals, etc) are all going to be on the same GPU. So that would require some additional thought.

[...] Edit: For completeness some important subroutines we will need are

Edge contraction (sum costs of duplicate edges).

We have a function that performs edge contraction.

Connected components.

We have a good implementation of connected components for symmetric (undirected) graphs.

(Approx.) maximum matching.

At first glance this looks very similar to a step in the Louvain clustering algorithm. In this point in the Louvain computation we look at each vertex, and for each of its neighbors we evaluate whether this vertex should move into the same cluster as one of its neighbors. So we evaluate each neighbor and pick the one that has the maximum delta modularity. Then in a second pass we update the cluster assignments. It seems like we have graph primitives that would satisfy this computation.

For each negatively weighted edge find a path in the graph of length in [2, 4] passing through only positively weighted edges find_small_conflicted_cycles.

This is potentially an area we would need to explore. We do have some primitives that do triangle counting. It's not immediately clear to me whether these can be extended to do what you are attempting here, or whether we would need something new. Same for the general version below.

A general version of 4 but on comparatively smaller graphs find_large_conflicted_cycles

Communication between edge weights and weights on triangles in a graph message passing

We have a collection of primitives to do reductions that can probably handle this. Would need to dig into this a bit more carefully.

aabbas90 commented 8 months ago

Thanks for providing more details. The steps 4 to 6 are not a necessary ingredient for the algorithm however, they are important for a good clustering. Therefore we can start with a basic version of the algorithm which just performs edge contraction based on maximum matching (steps 1-3). This will also help me in getting to know cugraph primitives better and so we can do rest of the implementation later-on.

aabbas90 commented 7 months ago

Hi, I am trying to build cugraph from source and facing conflict between CUB and thrust versions. Created an issue here

https://github.com/rapidsai/cugraph/issues/4170

Thanks.

(Edit: resolved)

aabbas90 commented 7 months ago

Hi @ChuckHastings,

I found this routine cpp/include/cugraph/graph_functions.hpp of graph coarsening for edge contraction. Is that what you are referring to? If yes, then we will need to run connected components on the set of contraction edges to first find new vertex labels and then use this routine.
Could you direct me to implementation(s) of maximum matching?

Thanks.

ChuckHastings commented 7 months ago

You might look here for graph coarsening. This calls the underlying function you found and does relabeling. All that is required for input is a set of labels (output from connected components would be sufficient).

This function does the Louvain computation and includes the updating of the clustering based on the maximum match here

Much of what is in the update_clustering_by_delta_modularity function is specific to Louvain. But you might need some of the same structure.

rapidsai / cugraph