Significantly improve performance

ragibson commented 4 years ago

leiden_runtimes

I noticed that igraph's community_leiden(objective_function="modularity") was faster than this package and wanted to make the difference less significant.

I think most of the changes here should be fairly self-explanatory if you look at one commit at a time. Feel free to comment on or ask for clarification on the more complicated changes.

Up until the very last commit here, the changes are "RNG consistent" with your master branch. That is, the results should be exactly the same as the current code. For example,

import igraph as ig
import leidenalg as la
from random import seed

seed(0)
G = ig.Graph.Erdos_Renyi(n=100, m=500, directed=True)

optimiser = la.Optimiser()
optimiser.set_rng_seed(0)
part = la.RBConfigurationVertexPartition(G)
optimiser.optimise_partition(part)
print(part.membership)

prints

[2, 0, 3, 2, 0, 1, 1, 0, 2, 4, 0, 0, 1, 3, 1, 0, 2, 1, 4, 3, 2, 3, 0, 3, 1, 2, 2, 3, 1, 4, 0, 0, 0, 1, 1, 4, 4, 1, 1, 3, 0, 2, 3, 2, 3, 0, 0, 0, 0, 0, 3, 4, 3, 1, 1, 0, 2, 2, 0, 2, 2, 1, 4, 2, 1, 0, 2, 4, 0, 0, 1, 0, 0, 4, 1, 1, 0, 1, 4, 2, 2, 4, 0, 2, 0, 0, 0, 2, 0, 1, 2, 3, 1, 0, 2, 4, 3, 0, 1, 1]

in both versions.

The last commit replaces some set iteration (which iterates in value-sorted order for size_t) with vector iteration (which iterates in the order in which the node's neighbors appear in igraph). This doesn't change the underlying algorithm, but the new iteration order slightly changes the results for any given RNG seed.

There's definitely more work that can be done, especially since this only makes tweaks to code paths taken with the default Optimiser parameters, but I think this is good enough for consideration at this point.

ragibson commented 4 years ago

It seems I have broken something with the fixed nodes -- I will look into this soon.

ragibson commented 4 years ago

I hadn't considered the fact that fixed nodes could force community labels to be nonconsecutive after calling renumber_communities(). Hopefully the issue is now fixed.

ragibson commented 4 years ago

I have to look into this more.

ragibson commented 4 years ago

I've added support for rearrange_community_labels() calls that create empty communities (this only occurs with fixed nodes as far as I can tell).

The testing scripts are now valgrind-clean on my end, so this will probably do the trick.

ragibson commented 4 years ago

The fixed nodes test fails on i686 architectures? I'll have to check this in a VM tonight.

vtraag commented 4 years ago

Great, thanks for this @ragibson ! I will have to take a closer look, but it seems promising indeed. The performance gain from not using a set in https://github.com/vtraag/leidenalg/pull/40/commits/8c08b3c5a231bfe577f8ab150589ae44ac09096f makes a lot of sense, and destroying the RNG compatibility with previous versions is no problem I think. People can still get identical results by using identical versions (which is not a strange requirement if you insist on exact replication).

(Btw, you can also mark a PR as a draft, and then mark it as finished if it is ready for review).

I will try to take a closer look somewhat later this week.

The fixed nodes test fails on i686 architectures? I'll have to check this in a VM tonight.

No idea why this happens, I will also have to check in more detail, at first sight it seems rather strange indeed.

ragibson commented 4 years ago

It turns out my code for the rearranging of community labels didn't update the bookkeeping correctly if the number of communities increased during the call. Without fixed nodes, this was only used to reorder labels and remove empty communities, so I hadn't fully considered how to handle this case.

I still want to test this a bit more (and maybe add a few test cases) before calling this "ready to merge".

If you're willing to go down a fascinating rabbit hole, the reason the test was only failing on the i686 architecture is partially due to that architecture having faulty (i.e. not IEEE 754 compliant) floating-point units. In fact, if you look at the CI run in 2843e4b91c649fdda904594bfd05f771fb209314, you'll see

possible_improv=1.98 > max_improv=1.98 ?
possible_improv=0x1.fae147ae147aep+0 > max_improv=0x1.fae147ae147aep+0 ?
(gt=1, eq=0, lt=0)

where the CPU evaluated possible_improv as being greater than max_improv, despite having the exact same double-precision value.

The gist of what's happening is that double comparisons here actually use 80-bit extended precision when values are stored in a register and 64-bit precision otherwise.

Bizarrely, the architecture allows both possible_improv > max_improv and possible_improv == max_improv to be true in a single line of code if the variables' register allocation changes. I actually had a few tests on a local VM run

printf("(gt=%d, eq=%d, lt=%d)\n", possible_improv > max_improv, possible_improv == max_improv, possible_improv < max_improv);

and print

(gt=1, eq=1, lt=0)

where the CPU presumably did the first comparison with one of the variables in a register and then moved it to main memory before doing the next comparison. This truncates 16 bits of precision, making the values equal even though the variables themselves were never changed.

This ended up making the i686 architecture follow slightly different paths through the code than all the other ones, revealing the bug.

ragibson commented 4 years ago

I don't see any remaining issues, even with DEBUG enabled, so I think this is ready for an initial, cursory review now.

I added one last fast path for undirected graphs, which makes the optimization essentially as fast as igraph's.

undirected_leiden_runtimes

Directed graphs (which igraph doesn't support right now) obviously see less of a benefit, but the earlier optimizations are still very significant.

directed_leiden_runtimes

vtraag commented 4 years ago

Again, great work @ragibson !

Good to see the i686 architecture revealed an actual bug. And indeed a fascinating rabbit hold to go down! Never would I have imagine to come across an implementation that would allow for strict inequality and equality at the same time.

I never expected this large performance gain still. The algorithm initially was designed to be flexible, not for speed, this was also the reason I included the Leiden algorithm in igraph itself. Seeing it now becoming almost as fast as the igraph implementation is a great improvement!

I will try to review the PR early next week. When it is merged I will probably do a quick release as soon as possible.

ragibson commented 4 years ago

Yes, there are also some tweaks that can be made with multilayer networks (e.g. relabeling instead of renumbering communities for each layer) and some non-default optimization modes (optimise_routine == Optimiser::MERGE_NODES and refine_routine == Optimiser::MOVE_NODES, for example), but those were outside the scope of my optimizations. In any case, those are not the "expected use case" for most users.

I'll look into the changes soon. Have a good holiday!

ragibson commented 4 years ago

Alright, I think that's all the changes you requested -- let me know what you think when you get back from vacation.

Note that the CI failures do not seem real. They fail with

downloading mingw64.db...
downloading mingw64.db.sig...
error: mingw64: signature from "David Macek <david.macek.0@gmail.com>" is unknown trust
error: failed to update mingw64 (invalid or corrupted database (PGP signature))
downloading msys.db...
downloading msys.db.sig...
error: msys: signature from "David Macek <david.macek.0@gmail.com>" is unknown trust
error: failed to update msys (invalid or corrupted database (PGP signature))
error: failed to synchronize all databases

so they should be rerun once this is fixed.

I think you'll be happy to know that your suggested tweak to calculate degrees directly did make a measurable difference, so this is slightly faster than before. final_leiden_runtimes

vtraag / leidenalg

Significantly improve performance #40