Open reuning opened 2 years ago
Thanks - this looks like a really helpful extension, especially for applying osdsm() to big graphs. It seems to be working well, but I probably won't pull it into the release version right away. I've been trying to keep the number of dependencies to a bare miniumum, and at a glance I think this would nearly double the current count. In the meantime, I'll direct folks to this fork. You might want to edit the README.md to let potential users know the parallel/fastglm option exists.
Sounds good. I added something at the top of the readme explaining it.
That looks great - thanks! I'll leave this issue open for now, since I'm hoping to eventually pull this into the main release. It just might take a while to get to it.
I realized one of the bottle necks was sample()
via apply()
call here.
I rewrote that line using Rcpp, although it does require RcppArmadillo.
For what it is worth, the R version is about 15 times slower:
> library(microbenchmark)
> K <- 5 # number of items
> Prob <- matrix(runif(1000*K,0,1),1000,K)
> Prob <- Prob/rowSums(Prob)
> bench <- microbenchmark(backbone:::sample_matrix(Prob),
+ apply(X = Prob, MARGIN = 1,
+ FUN = function(x) sample(c(1:(K-1),0), size = 1, replace= TRUE, prob = x)),
+ times=500)
> summary(bench, unit="relative")
expr
1 backbone:::sample_matrix(Prob)
2 apply(X = Prob, MARGIN = 1, FUN = function(x) sample(c(1:(K - 1), 0), size = 1, replace = TRUE, prob = x))
min lq mean median uq max neval
1 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 500
2 16.05441 16.26765 17.69711 16.34211 16.57831 17.91553 500
I don't do a lot of Rcpp so I did write a test to make sure it was pulling the same sample (with the same seeds).
Hope this helps.
This is really great - thanks! The package already depends on Rcpp for the fastball() function, so pulling this in wouldn't add any additional dependencies. I'll do a bit more testing in my end, but this looks promising.
Not sure if this of interest for you all but I put together a parallel version of the
osdsm
MC loop here.It also uses fastglm. Everything is put in with fallbacks. I had to rewrite a bit of the monte carlo loop to make the results easier to pass out of a function.