exploit embarrassingly parallel options in c++ code

njtierney commented 7 years ago

The following functions have embarrassingly parallel operations for rowwise and colwise

distance_matrix_cpp
binary_matrix_cpp

Perhaps RcppParallel can do something here

njtierney commented 7 years ago

check out unrolling, from this blog post: https://privefl.github.io/blog/Tip-Optimize-your-Rcpp-loops/

njtierney commented 7 years ago

Good guide here for parallel computing: http://gallery.rcpp.org/articles/parallel-distance-matrix/

njtierney commented 6 years ago

Heya @mpadge, does geodist take advantage of parallelization?

mpadge commented 6 years ago

Nup, coz it's all in C and dependency free, but what you could do is pilfer code from dodgr, which has loadsa pure RcppParallel, especially here. The OneDist worker is a pretty simple example, the OneFlow worker much more complicated (where threads all aggregate to same object, and so write binary chunks to disk which are later reassembled).

njtierney commented 6 years ago

Ah fair enough! Thanks for the links, I'll dig more into this when I get into the 0.4.0 release

mpadge commented 5 years ago

You could close this, no? Non-parallel geodist is plenty fast enough for any operations you might like; you wouldn't notice any real difference out to so-big-it-don't-fit-in-memory

njtierney commented 5 years ago

Yeah let's close this, but perhaps it might be useful to have a "wishlist" of things - speed can be one of them? Although to be honest, I'm pretty sure the biggest bottle neck is actually building the design matrix.

mkuehn10 commented 5 years ago

I see you closed this issue, but I have written implementations that are very similar to your distance_matrix_cpp and nearest_facility_dist that use RcppParallel. My nearest "facility" function only returns the identifier and not the distance, but I could most likely modify it to do that. I also designed it to take another parameter that specifies if you want to look within X miles, so it would ignore any sites that are not within X miles. It's about ~5.8 times faster on my machine using the parallel version, but I'm not well versed in high performance computing, so who knows if it could be improved beyond that.

library(maxcovr)

set.seed(10)
n_users <- 5000
n_sites <- 25
x <- cbind (-10 + 20 * runif (n_users), -10 + 20 * runif (n_users))
y <- cbind (-10 + 20 * runif (2 * n_sites), -10 + 20 * runif (2 * n_sites))
colnames (x) <- colnames (y) <- c ("x", "y")

microbenchmark::microbenchmark(
  maxcovr = maxcovr::distance_matrix_cpp(y, x),
  parallel = rcpp_parallel_distm_C(x, y) * 1609.34)

microbenchmark::microbenchmark(
  maxcovr = maxcovr::nearest_facility_dist(y, x),
  parallel = rcpp_parallel_distm_C_min(x, y, 5000)
)

Unit: milliseconds
     expr     min       lq      mean   median       uq     max neval cld
  maxcovr 49.4812 50.76415 52.076795 51.56360 52.69165 59.2069   100   b
 parallel  8.7636  8.81665  9.133918  8.90215  9.03285 14.4423   100  a 

Unit: milliseconds
     expr     min       lq      mean   median       uq     max neval cld
  maxcovr 50.8327 51.61255 53.182549 52.24655 54.16985 60.6313   100   b
 parallel  8.2759  8.32910  8.504298  8.42095  8.46105 11.6456   100  a

set.seed(10)
n_users <- 5
n_sites <- 10
x <- cbind (-10 + 20 * runif (n_users), -10 + 20 * runif (n_users))
y <- cbind (-10 + 20 * runif (2 * n_sites), -10 + 20 * runif (2 * n_sites))
colnames (x) <- colnames (y) <- c ("x", "y")

maxcovr::nearest_facility_dist(y, x)
     [,1] [,2]      [,3]
[1,]    1   15 232454.32
[2,]    2   15 227046.58
[3,]    3   15  68395.58
[4,]    4    1 200541.27
[5,]    5   18 365391.14

rcpp_parallel_distm_C_min(x, y, 5000)
[1] 15 15 15  1 18

mkuehn10 commented 5 years ago

Here is a proof of concept with the C++ code included: https://github.com/mkuehn10/pargeodist

njtierney commented 5 years ago

Wow @mkuehn10 - that looks amazing, thank you so much for writing and sharing that!

It would be great to include in maxcovr, but in the future I'm looking to incorporate geodist to calculate distances, to make that part of the package easier to maintain, plus faster, since there are options such as cheaprule, haversine, and so on.

That said, I feel like a parallel version of geodist would be a very valuable contribution, but I'm not sure if it would be its own package, or if it should be rolled into geodist - that would be up to @mpadge to decide.

But, looking at the benchmarks you have provided, there is about a 4x speedup using parallel, which seems pretty amazing.

Cross ref: https://github.com/hypertidy/geodist/issues/16

mkuehn10 commented 5 years ago

Thank you. I'm surprised I hadn't found these packages until yesterday (in retrospect the whole reason I went down the parallel path is that I couldn't find anything that was fast enough for my use case). I have access to a fairly beefy computer (something like 72 cores and a ton of memory), so it can melt through some fairly large distance calculations. I like having options and it wasn't really a premature optimization in this case -- I was sitting there waiting for like 20-30 minutes (using a completely implemented solution in R) for some calculations to complete and after I implemented the parallel version it was something like 45 seconds.

I'm happy to (try to) make some pull requests, or to just inspire someone else to write better parallel code.

mkuehn10 commented 5 years ago

Also, regarding the 4x speedup -- that would/should scale up based on whatever system you are using. I see about a ~8-9x difference running it on a much beefier machine and running it through much larger matrices.

This was when n = 15000

Unit: seconds
     expr      min        lq      mean    median        uq       max neval
  geodist 46.40521 46.981085 46.940734 46.990139 47.143812 47.183421     5
 parallel  5.28246  5.381774  5.575951  5.390658  5.825183  5.999681     5

njtierney / maxcovr

exploit embarrassingly parallel options in c++ code #20