Closed ChuckHastings closed 6 months ago
Quick first observation of this. I have run this several times. For 64-bit I am always seeing everything align (as you indicate in your post). For 32-bit I am seeing some variability. Sometimes I get matching answers, sometimes I get differences. So this behavior (in 32-bit) is still non-deterministic.
I wonder if we are seeing floating point rounding errors causing different choices in the clustering decisions. I will attempt to isolate why we are seeing differences.
I have validated that there are issues related to numerical instability that are causing repeatability issues for 32-bit weights.
I would note that if your graph is large enough you'll probably see these in 64-bit weights as well, although your graph would need to be pretty large, I would think.
Our implementation of Leiden/Louvain executes computations in parallel, there is no guarantee that the order of additions of floating point numbers will be consistent from call to call. If we have a cluster weight on the order of 1e-5, and we have an edge weight on the order of 1e-11, the order in which the additions occur will have subtle changes to the result. As an example that I saw in this data:
v
to move into cluster c1
as 5.08771e-5, and the delta modularity to move into cluster c2
was 5.08771e-5 (with c1
< c2
). A tie (the difference between them was exactly 0). Our tie breaking procedure is to pick the smaller cluster id, so we put the vertex v
into cluster c1
.c2
we still get 5.08771e-5, but for c1
we get 5.08770e-5. The additions must have included one of the larger values before some of the much smaller values, and because we were dealing with a 32-bit float we lost the addition of the smaller values. In this case, delta modularity for c2
is larger than c1
, so we decide to move the vertex v
into cluster c2
.This results in a change in the entire sequence of decisions for the greedy Louvain and Leiden, so we end up with a different (sometimes radically different) clustering choice... but pretty good modularity score.
We can look into some changes to address this. But this would definitely be a longer term research problem.
I would suggest that if repeatability is a priority for you, using the 64-bit double should dramatically reduce the change of instability in the computation.
Thank you for looking into this!
Follow up from issue #4072.
Since this is probably a different problem than the dendrogram flattening that you fixed, should I open a new issue? Or can you reopen this one?
Originally posted by @jpintar in https://github.com/rapidsai/cugraph/issues/4072#issuecomment-2074822898