Parallel version - Githubissues

reuning commented 2 years ago

Not sure if this of interest for you all but I put together a parallel version of the osdsm MC loop here.

It also uses fastglm. Everything is put in with fallbacks. I had to rewrite a bit of the monte carlo loop to make the results easier to pass out of a function.

zpneal commented 2 years ago

Thanks - this looks like a really helpful extension, especially for applying osdsm() to big graphs. It seems to be working well, but I probably won't pull it into the release version right away. I've been trying to keep the number of dependencies to a bare miniumum, and at a glance I think this would nearly double the current count. In the meantime, I'll direct folks to this fork. You might want to edit the README.md to let potential users know the parallel/fastglm option exists.

reuning commented 2 years ago

Sounds good. I added something at the top of the readme explaining it.

zpneal commented 2 years ago

That looks great - thanks! I'll leave this issue open for now, since I'm hoping to eventually pull this into the main release. It just might take a while to get to it.

reuning commented 2 years ago

I realized one of the bottle necks was sample() via apply() call here.

I rewrote that line using Rcpp, although it does require RcppArmadillo.

For what it is worth, the R version is about 15 times slower:

> library(microbenchmark)
> K <- 5 # number of items
> Prob <- matrix(runif(1000*K,0,1),1000,K)
> Prob <- Prob/rowSums(Prob)
> bench <- microbenchmark(backbone:::sample_matrix(Prob), 
+                apply(X = Prob, MARGIN = 1,
+                      FUN = function(x) sample(c(1:(K-1),0), size = 1, replace= TRUE, prob = x)), 
+                      times=500)
> summary(bench, unit="relative")
                                                                                                             expr
1                                                                                  backbone:::sample_matrix(Prob)
2 apply(X = Prob, MARGIN = 1, FUN = function(x) sample(c(1:(K -      1), 0), size = 1, replace = TRUE, prob = x))
       min       lq     mean   median       uq      max neval
1  1.00000  1.00000  1.00000  1.00000  1.00000  1.00000   500
2 16.05441 16.26765 17.69711 16.34211 16.57831 17.91553   500

I don't do a lot of Rcpp so I did write a test to make sure it was pulling the same sample (with the same seeds).

Hope this helps.

zpneal commented 2 years ago

This is really great - thanks! The package already depends on Rcpp for the fastball() function, so pulling this in wouldn't add any additional dependencies. I'll do a bit more testing in my end, but this looks promising.

zpneal / backbone

Parallel version #39