Open joernroeder opened 8 years ago
There are basically two important loops that should be straightforward to parallelize:
https://github.com/lvdmaaten/bhtsne/blob/master/sptree.cpp#L385 https://github.com/lvdmaaten/bhtsne/blob/master/tsne.cpp#L215
The first loop is actually embarrassingly parallel, so it should be completely trivial. The second loop may be somewhat trickier because each iteration may access the same nodes of the tree, so it is somewhat less predictable what speedups you can get there.
@lvdmaaten thanks for the infos. I'll look a bit deeper into the loops and play around with it as soon as i find some spare time for it :)
I have an OpenMP based version here: https://github.com/maximsch2/bhtsne. I don't like the binary file interface, so I'm also modifying it to build as a shared library and expose a simple C API. An extra pair of eyes is always useful when writing parallel code, so if anyone else is interested, go ahead and try it out!
I'm no OpenMP expert, but it looks good to me. What kind of speed-ups are you seeing compared to the non-OpenMP code?
I haven't benchmarked it on big datasets yet, but I think I get around 1.3-1.5x on two cores on a smallish dataset (takes a couple of seconds to build). My main motivation was integrating it into visualization system where I wanted to generate t-SNE graphs on the fly. I think making it available as a library (as opposed to writing and reading files), and encoding output dimension as a template parameter (allowing compiler to do less allocation and also hopefully unroll all those loops along dimensions, especially when out_dims=2,3) gave me at least 3-4x improvement in performance.
Nice! In an earlier version of the code, I had hardcoded the output dimension. Indeed, that was a lot faster than the version that is currently in the repo.
@maximsch2 Would it be complicated to provide Windows build support for your OpenMP implementation? Thanks :)
I don't have an easy access to a Windows machine, but I don't think there is anything unix-specific that I've added there. I think MinGW supports OpenMP, so you should be able to build it using gcc just like you would do on Linux. Are you having some specific issues with Windows?
EDIT: I've just checked and apparently there is even a description of how to build it on Windows. I haven't updated the Makefile.win
though, so it might be a bit broken... Can you try just building tsne_bin.cpp
using Visual Studio to get a binary? Just that file, no reason to add anything else to it.
EDIT2: I've pushed a version with updated Makefile.win
, which has a higher chance of working.
@lvdmaaten Answering your prior question about speed up, I went ahead and benchmarked it on a 25000x50 dataset, on quad core CPU (with HT, so it presents itself as 8-core). Results:
OMP_NUM_THREADS=1
real 2m23.372s
user 2m23.269s
sys 0m0.091s
OMP_NUM_THREADS=2
real 1m49.877s
user 2m31.483s
sys 0m0.103s
OMP_NUM_THREADS=4
real 1m27.637s
user 2m34.543s
sys 0m0.139s
OMP_NUM_THREADS=8
real 1m24.935s
user 3m39.806s
sys 0m0.159s
This is a total time including reading a file, building a tree (currently not paralellized and takes around 25 seconds in this case) and doing 1000 iterations of embedding learning. Speed up is not perfect, but some scaling is there.
Thanks, seems to be working :)
Nice!
I was able to get this version working on Windows, but the multicore version by maximsch2 is just returning with a "non-zero return code" pretty fast. It's not giving more information even though verbose=True.
@baobabKoodaa If you want, I can try running your script/data here on Linux to see if this is a Windows-specific issue.
@maximsch2 Thank you for the idea and for the kind offer. I will try running it on a Linux machine at some point, for now I can make due with the single core version.
Since it hasn't been mentioned yet, see https://github.com/DmitryUlyanov/Multicore-TSNE
Hey, it's more a question than an actual issue: I'm mapping a dataset with 32dims x 900000items with tsne on a multi-core machine but as tsne is single threaded i'm just using one core. Do you have any tipps or tricks how i can split the dataset to parallelize computation? thanks in advance!