Closed njtierney closed 5 years ago
check out unrolling, from this blog post: https://privefl.github.io/blog/Tip-Optimize-your-Rcpp-loops/
Good guide here for parallel computing: http://gallery.rcpp.org/articles/parallel-distance-matrix/
Heya @mpadge, does geodist
take advantage of parallelization?
Nup, coz it's all in C and dependency free, but what you could do is pilfer code from dodgr
, which has loadsa pure RcppParallel
, especially here. The OneDist
worker is a pretty simple example, the OneFlow
worker much more complicated (where threads all aggregate to same object, and so write binary chunks to disk which are later reassembled).
Ah fair enough! Thanks for the links, I'll dig more into this when I get into the 0.4.0 release
You could close this, no? Non-parallel geodist
is plenty fast enough for any operations you might like; you wouldn't notice any real difference out to so-big-it-don't-fit-in-memory
Yeah let's close this, but perhaps it might be useful to have a "wishlist" of things - speed can be one of them? Although to be honest, I'm pretty sure the biggest bottle neck is actually building the design matrix.
I see you closed this issue, but I have written implementations that are very similar to your distance_matrix_cpp and nearest_facility_dist that use RcppParallel. My nearest "facility" function only returns the identifier and not the distance, but I could most likely modify it to do that. I also designed it to take another parameter that specifies if you want to look within X miles, so it would ignore any sites that are not within X miles. It's about ~5.8 times faster on my machine using the parallel version, but I'm not well versed in high performance computing, so who knows if it could be improved beyond that.
library(maxcovr)
set.seed(10)
n_users <- 5000
n_sites <- 25
x <- cbind (-10 + 20 * runif (n_users), -10 + 20 * runif (n_users))
y <- cbind (-10 + 20 * runif (2 * n_sites), -10 + 20 * runif (2 * n_sites))
colnames (x) <- colnames (y) <- c ("x", "y")
microbenchmark::microbenchmark(
maxcovr = maxcovr::distance_matrix_cpp(y, x),
parallel = rcpp_parallel_distm_C(x, y) * 1609.34)
microbenchmark::microbenchmark(
maxcovr = maxcovr::nearest_facility_dist(y, x),
parallel = rcpp_parallel_distm_C_min(x, y, 5000)
)
Unit: milliseconds
expr min lq mean median uq max neval cld
maxcovr 49.4812 50.76415 52.076795 51.56360 52.69165 59.2069 100 b
parallel 8.7636 8.81665 9.133918 8.90215 9.03285 14.4423 100 a
Unit: milliseconds
expr min lq mean median uq max neval cld
maxcovr 50.8327 51.61255 53.182549 52.24655 54.16985 60.6313 100 b
parallel 8.2759 8.32910 8.504298 8.42095 8.46105 11.6456 100 a
set.seed(10)
n_users <- 5
n_sites <- 10
x <- cbind (-10 + 20 * runif (n_users), -10 + 20 * runif (n_users))
y <- cbind (-10 + 20 * runif (2 * n_sites), -10 + 20 * runif (2 * n_sites))
colnames (x) <- colnames (y) <- c ("x", "y")
maxcovr::nearest_facility_dist(y, x)
[,1] [,2] [,3]
[1,] 1 15 232454.32
[2,] 2 15 227046.58
[3,] 3 15 68395.58
[4,] 4 1 200541.27
[5,] 5 18 365391.14
rcpp_parallel_distm_C_min(x, y, 5000)
[1] 15 15 15 1 18
Here is a proof of concept with the C++ code included: https://github.com/mkuehn10/pargeodist
Wow @mkuehn10 - that looks amazing, thank you so much for writing and sharing that!
It would be great to include in maxcovr, but in the future I'm looking to incorporate geodist to calculate distances, to make that part of the package easier to maintain, plus faster, since there are options such as cheaprule, haversine, and so on.
That said, I feel like a parallel version of geodist would be a very valuable contribution, but I'm not sure if it would be its own package, or if it should be rolled into geodist - that would be up to @mpadge to decide.
But, looking at the benchmarks you have provided, there is about a 4x speedup using parallel, which seems pretty amazing.
Thank you. I'm surprised I hadn't found these packages until yesterday (in retrospect the whole reason I went down the parallel path is that I couldn't find anything that was fast enough for my use case). I have access to a fairly beefy computer (something like 72 cores and a ton of memory), so it can melt through some fairly large distance calculations. I like having options and it wasn't really a premature optimization in this case -- I was sitting there waiting for like 20-30 minutes (using a completely implemented solution in R) for some calculations to complete and after I implemented the parallel version it was something like 45 seconds.
I'm happy to (try to) make some pull requests, or to just inspire someone else to write better parallel code.
Also, regarding the 4x speedup -- that would/should scale up based on whatever system you are using. I see about a ~8-9x difference running it on a much beefier machine and running it through much larger matrices.
This was when n = 15000
Unit: seconds
expr min lq mean median uq max neval
geodist 46.40521 46.981085 46.940734 46.990139 47.143812 47.183421 5
parallel 5.28246 5.381774 5.575951 5.390658 5.825183 5.999681 5
The following functions have embarrassingly parallel operations for rowwise and colwise
Perhaps RcppParallel can do something here