neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
769 stars 195 forks source link

Improve randomness for LPA #858

Closed knutwalker closed 5 years ago

knutwalker commented 5 years ago

This one includes #857, so that one should be merged first.

jexp commented 5 years ago

Thanks a lot Paul.

jexp commented 5 years ago

@knutwalker we should see in which other algorithms we could/should use those new local random generators.

How does the performance compare between LPA on Huge vs. Heavy?

knutwalker commented 5 years ago

@jexp Louvain and Infomap could make use of the random generators.

I've benchmarked the performance of the plain iterators, but not yet of the complete LPA impls

knutwalker commented 5 years ago

@jexp seems like the HugeLPA doesn't do too well and the changes introduced a regression to the regular LPA as well

Screenshot_2019-04-03 JMH Visualizer

jexp commented 5 years ago

Did you flamegraph it?

Inlining with inheritance?

knutwalker commented 5 years ago

I've got some here lpa-flamegraphs.zip

szarnyasg commented 5 years ago

@knutwalker I was recently looking into LPA again, for using it in the LDBC Graphalytics benchmark. LPA docs says that:

At every iteration of propagation, each node updates its label to the one that the maximum numbers of its neighbours belongs to. Ties are broken uniformly and randomly. The Graphalytics benchmark requires deterministic execution, so in case of a tie, the smallest label wins. According to the benchmark specification:

image

Do you think it's possible to support such an approach to achieve determinism?

jexp commented 5 years ago

We could add an option for that. Default to random option for smallest