Fixed #71. There were a bunch of modifications I had to do in order to parallelise this:
Now we need to clone the RNG before passing it to each separate "thread" (it's not really a thread as we are using rayon under the hood so a thread pool is (re)used); this means that now the choice of whichRNG we use is essential for the algorithm to behave correctly. More specifically, we are now dropping the use of XorShiftRng due a cloning pitfall and we are using the Xoshiro256StarStar which is of higher quality and actually faster (7 Gb/s according to the "Rust rand book" vs 5 Gb/s);
After profiling the code, I did realise we had another bottleneck (that @MeBrei warned me about for a while): our RandomWalks::count_visits was eating the majority of the time for rank_network, but luckily it was easily parallelisable;
Results
Before parallelising the code, this is the result of running the osrank-export-to-gephi program:
You will see how the full run takes 258 seconds single threaded (you see how the system % says 98, i.e. this is not using all the cores). Afterwards:
Note that despite it seems this is slower, that's just time being confusing here: this is now using 332% of CPU time (basically almost all my 4 cores) and if you compare the log timing you will see in the sequential case the time taken was ~5mins, whereas the parallel counterpart takes roughly 2.
Important note: It has to be noted that now the algorithm is optimised for large data structures, and for small graphs we are going to pay the price for initialising the rayon thread pool and possibly this could even make things slower, for smaller inputs.
@MeBrei It might be interesting to merge this piece of work and re-generate your nice criterion graphs, to see how the system is doing now.
Fixed #71. There were a bunch of modifications I had to do in order to parallelise this:
Now we need to clone the
RNG
before passing it to each separate "thread" (it's not really a thread as we are usingrayon
under the hood so a thread pool is (re)used); this means that now the choice of whichRNG
we use is essential for the algorithm to behave correctly. More specifically, we are now dropping the use ofXorShiftRng
due a cloning pitfall and we are using theXoshiro256StarStar
which is of higher quality and actually faster (7 Gb/s according to the "Rust rand book" vs 5 Gb/s);After profiling the code, I did realise we had another bottleneck (that @MeBrei warned me about for a while): our
RandomWalks::count_visits
was eating the majority of the time forrank_network
, but luckily it was easily parallelisable;Results
Before parallelising the code, this is the result of running the
osrank-export-to-gephi
program:You will see how the full run takes 258 seconds single threaded (you see how the system % says 98, i.e. this is not using all the cores). Afterwards:
Note that despite it seems this is slower, that's just
time
being confusing here: this is now using332%
of CPU time (basically almost all my 4 cores) and if you compare the log timing you will see in the sequential case the time taken was ~5mins, whereas the parallel counterpart takes roughly 2.Important note: It has to be noted that now the algorithm is optimised for large data structures, and for small graphs we are going to pay the price for initialising the
rayon
thread pool and possibly this could even make things slower, for smaller inputs.@MeBrei It might be interesting to merge this piece of work and re-generate your nice criterion graphs, to see how the system is doing now.