kwikteam / klustakwik2

Fast software for high-dimensional cluster analysis using the masked EM algorithm for Gaussians mixtures
BSD 3-Clause "New" or "Revised" License
31 stars 13 forks source link

Use multiple processors #7

Open thesamovar opened 9 years ago

thesamovar commented 9 years ago

In the old KlustaKwik there was some but not a huge benefit to using multiple cores because the problem was memory bandwidth limited. However, in KK2 the memory usage is reduced by orders of magnitude (especially for larger problems), so we might well see much better speed improvements to multiple processors.

There is a technical issue. As far as I know, Numba does not support multiple processors except in the vectorize decorator which is not something we can use in KK2 (and then only in the 'pro' version). I don't see any way around this. This might mean we have to stick to Cython.

@rossant any thoughts?

rossant commented 9 years ago

How do you want to use multiple cores exactly?

Le lundi 27 avril 2015, Dan Goodman notifications@github.com a écrit :

In the old KlustaKwik there was some but not a huge benefit to using multiple cores because the problem was memory bandwidth limited. However, in KK2 the memory usage is reduced by orders of magnitude (especially for larger problems), so we might well see much better speed improvements to multiple processors.

There is a technical issue. As far as I know, Numba does not support multiple processors except in the vectorize decorator which is not something we can use in KK2 (and then only in the 'pro' version). I don't see any way around this. This might mean we have to stick to Cython.

@rossant https://github.com/rossant any thoughts?

— Reply to this email directly or view it on GitHub https://github.com/klusta-team/klustakwik2/issues/7.

thesamovar commented 9 years ago

The main one is in the E-step. We have a key loop which, for each cluster, involves iterating over all spikes. I use an OpenMP parallel for over this inner loop over spikes in the C++ version. I'd like to do the equivalent in the Python version.

rossant commented 9 years ago

Maybe we can use this feature to implement a parallel for loop with Numba?

rossant commented 9 years ago

See also http://numba.pydata.org/numba-doc/dev/user/examples.html#multi-threading

thesamovar commented 9 years ago

Note to myself: to do this in Cython using OpenMP, we don't have access to the keyword that makes a copy of the variable for each thread, but we can allocate them in a list/array of variables and then access them using the thread index.

rossant commented 9 years ago

Do you think Numba will let us use multiple CPUs here?

thesamovar commented 9 years ago

I think it can be done but might be simpler using Cython. Am happy to switch to Numba but since everything is in Cython at the moment I'll stick with that for now. The big advantage of Numba to me would be that I wouldn't have to type all the variables explicitly, and we could mix and match arrays with different dtypes (e.g. float32, float64, int16, int32, int64). This is possible in Cython but gets complicated when you have multiple arrays each of which could have different dtypes.

thesamovar commented 9 years ago

OK this is done for the E-step now and it works pretty well. I'll leave it open in case we want to do the M-step too, but the E-step is most of the work.

c-wilson commented 9 years ago

Is it possible to set the number of threads that klustakwik will use? Right now it's using all of my physical and virtual CPUs, I'd like to be able to specify how many if possible. I'm using it through phy and have my OMP_NUM_THREADS=1. Thanks!

thesamovar commented 9 years ago

I'll look into this, I created a new issue #67 that you can follow if you want.

thesamovar commented 9 years ago

OK I fixed this. It was indeed ignoring OMP_NUM_THREADS but it was by design (long story). I've added a new parameter num_cpus which you can set to the number of CPUs you want to use. This is now in the current git master branch.

c-wilson commented 9 years ago

Great. Just to make sure I understand: to use this, I add “num_cpus=12" to the klustakwik2 dictionary of my prm?

thesamovar commented 9 years ago

Yes, if you have the latest version of KK2.

On 15/07/2015 21:15, Chris Wilson wrote:

Great. Just to make sure I understand: I can now add “num_cpus=12" as a kk parameter to my prm file?

On Jul 15, 2015, at 4:06 PM, Dan Goodman notifications@github.com wrote:

OK I fixed this. It was indeed ignoring OMP_NUM_THREADS but it was by design (long story). I've added a new parameter num_cpus which you can set to the number of CPUs you want to use. This is now in the current git master branch.

— Reply to this email directly or view it on GitHub https://github.com/kwikteam/klustakwik2/issues/7#issuecomment-121730927.

— Reply to this email directly or view it on GitHub https://github.com/kwikteam/klustakwik2/issues/7#issuecomment-121732824.

rossant commented 9 years ago

note that others have reported a bug in phy where KK2 params were not properly taken into account -- should be fixed this week