Multithreaded support for loop action

jeanluct commented 9 years ago

From Marko Budisic on 2014-10-01 15:26:27+00:00

The current out-of-Matlab parallelization exists in colorbraiding_helper.hpp.

It uses as many threads as the user wants as it goes around Matlab's parallelization toolbox, although, by default, we set this number from within the Matlab to the number of Matlab-available cores (unless the user requests more or less).

Parallelization of colorbraiding is relatively simple as it is applied only to the pairwise-intersections search, which can be done independent for each other. I used a minimalistic ThreadPool library to access threads from C++ (although I modified it to reduce the number of C++11 constructs used to ensure backward compatibility with some GCC compilers that our Matlab versions preferred). If anyone needs more info, I can document further how this same functionality can be replicated for another purpose.

I am not sure if/how C++ threading would get onto GPU, although I thought it would likely be valuable tool to have looking ahead to parallelizing either loop action or improving on parallelization of colorbraiding

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-10-08 15:57:32+00:00

After revisiting some old tests, I think multithreaded support is premature. The reason, which is clear after running loop_speedtest.m in devel/tests/loop_speedtest (simple MEX compilation required), is that the huge slowdown right now is not limited by the C/C++ function but by Matlab. Here's the output:


>> loop_speedtest
Number of loops = 531441

action of braid on loops (vectors)...  (0.113561 seconds)

converting to array of loops...  (5.603251 seconds)

action of braid on array of loops (classes)...  (6.167722 seconds)

verify equality...  (1.500239 seconds)

Slowdown = 103.653305

The difference is that the first case is just a set of loops represented by a plain array (no loop class). The C function acts on this directly.

Then this is converted to loop objects, which takes a lot of time. The action on loops takes even more time.

So we need to solve this problem first. There's a few options:

Since LCS is a rare application that requires an enormous number of loops, do something customized and fast in LCS itself, like avoiding the loop class until the loops have been operated on and analyzed.
Allow LCS low-level access to braidlab binaries, as an exception. (Hidden method?)
Modify loop so it can vectorize better: allow the internal coords data property to be two-dimensional. This will hopefully give a huge speedup.

I'd like to try the last option first. Let's see if it's feasible...

jeanluct commented 9 years ago

From Jean-Luc Thiffeault on 2014-10-08 16:36:50+00:00

Opened a new issue #71 to deal with this, since it's unrelated to multithreading. Will report back, though.

jeanluct commented 9 years ago

Should this still be open? I think it's obsolete now that we implemented #71.

jeanluct commented 9 years ago

I guess we could still do this: multithread the action on loops. Not a priority, though.

jeanluct commented 9 years ago

Maybe this is a good feature to aim for for a 4.0 release. If it turns out to be fairly easy we can always downgrade to 3.2.

mbudisic commented 9 years ago

This is now implemented using ThreadPool class. The speed-up over single core exists in tests, although it depends on the length of braids, number of punctures, and number of loops.

Practical speed-up is good, although not stellar. On my two-core laptop results in near-2x speed-up of braid-loop multiplication for a braid with cca 350k generators, with 64 punctures, and cca 4000 loops. (braid generated by Hackborn rotor, so something realistic).

I am closing this for now, although we could look towards implementing this via C++11 asynchronous threading in future.

jeanluct / braidlab

Multithreaded support for loop action #68