Closed hacktuarial closed 8 years ago
Wow! Thanks for this!
Rcpp and ROxygen seemed like overkill here, so I stripped that stuff out.
(I have a very strong aversion to C++)
I didn't know any other way to incorporate my C loop, so thanks for showing the way. I am super excited about the methodology in your paper, and glad to provide these performance tweaks.
Glad to hear it!
Just a heads up: I moved the "parallel" option to mhglm.control
, and I removed the verbose
option. The logging is all still there, but if you want to see it, you need to call basicConfig("INFO")
.
I made some other minor changes, including switching the whitespace to be consistent with the rest of the package (indent by 4 spaces, not 2). I had to make a few other changes to get R CMD check --as-cran
to pass.
I submitted version 0.5 of the package to CRAN; it will hopefully appear in a few days.
By the way, Ningshan Zhang and I are working on some extensions to allow for many more predictor variables. Stay tuned...
Good idea - the parallel option belongs more naturally in mhglm.control. Sorry about the 2 spaces vs 4 spaces inconsistency. I'm excited for it to be on CRAN soon!
I imagine the difficulty with more predictor variables is estimating the large number of terms in the covariance matrix - is this the problem you're working on? Some folks at Stitch Fix are thinking about hacking the method to enforce a diagonal covariance matrix. This would be similar to lme4's "double bar" notation, ||, but we don't really have much theoretical basis for it.
@hacktuarial I just looking through your code again, and I'm confused about the line
if(parallel) x <- bigmemory::as.big.matrix(x, shared=TRUE)
in rdglm.group.fit
. What's the reason for this? In the inner loop, when you call model <- rdglm.fit(x = x[j,,drop=FALSE], ...)
, the subsetting operation x[j,,drop=FALSE]
returns a vanilla R matrix.
oh wait, is this so the foreach
loop doesn't copy the object?
Right - without this, foreach
makes ncores copies of the original object which can blow up available memory. It's OK to copy subsets of the original object as matrices because they're fairly small, and since I don't need any other functionality from bigmemory
, they can be matrix
class instead of big.matrix
class.
FYI, ranef.mhglm
is still using sequential code (which means that predict.mhglm
is also sequential). Up for adding parallel support to these functions? To make this easier, I changed mhglm
to return the control
object as part of its result, so you can test object$control$parallel
in these functions.
I'll work on it!
Great! It's probably not worth it to change anything in the predict.mhglm
function; it may actually end up being slower if you do. Just focus your efforts on ranef.mhglm
.
This PR has two major modifications to the original mbest implementation.
I had some difficulty incorporating C++ into the package, and ultimately used Roxygen to get it done. Consequently I had to move much of the NAMESPACE import/export functionality into Roxygen format. Unfortunately, this destroyed the ability to conditionally import the sigma function from lme4 or stats depending on the R version.