paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.27k stars 182 forks source link

Perform some data transformations inside the Stan code #50

Closed waynefoltaERI closed 7 years ago

waynefoltaERI commented 8 years ago

This is a specific use-case that may not rank highly, but I hope to demo brms and Bayesian modeling in a lecture format and would love to be able to have brm generate Stan code that I could show without having to wait for Stan/C++ to compile it, then for it to be fit (which will be shorter than the Stan/C++ compile).

So I'm thinking something like a stancode_only=TRUE argument would be a nice thing to have for this situation and general learning. (Not sure how it should be handled. On the one hand, brm will output a structure of type brmsfit which stancode takes as an argument. But a brmsfit that's not a fit doesn't make sense. Maybe stancode could accept brmsfit or formula and if offered a formula would do what I'm suggesting.)

paul-buerkner commented 8 years ago

You could try out the make_stancode function, which takes the same arguments as brm but only returns the Stan Code.

waynefoltaERI commented 8 years ago

Sorry, I'd missed that!

Now that I tried it I had a question:

make_stancode (Petal.Length ~ Petal.Width, data=ir) functions { } data { int N; // number of observations vector[N] Y; // response variable int K; // number of fixed effects matrix[N, K] X; // FE design matrix vector[K] X_means; // column means of X } transformed data { } parameters { real temp_Intercept; // temporary Intercept vector[K] b; // fixed effects real sigma; // residual SD } transformed parameters { vector[N] eta; // linear predictor // compute linear predictor eta <- X * b + temp_Intercept; for (n in 1:N) { eta[n] <- (eta[n]); } } model { // prior specifications sigma ~ student_t(3, 0, 5); // likelihood contribution Y ~ normal(eta, sigma); } generated quantities { real b_Intercept; // fixed effects intercept b_Intercept <- temp_Intercept - dot_product(X_means, b); }

From make_standata, it appears that the X values are normalized (centered and scaled) in R before passing to Stan. You account for the centering in generated quantities, but does there need to be any accounting for the scaling? (I guess not, but...) I was also wondering if the center/scaling might be done in transformed data which appeals to me because it makes the Stan code more stand-alone -- I feed my (non-normalized) data to it just as I feed it to brm.

Just a thought.

paul-buerkner commented 8 years ago

The X values are just centered but not scaled. I tried to make sure that the parameters returned by the brms Stan Code can be interpreted without any further transformation (only the names are changed).

The reason why I use the transformed data block so rarely is that I try to keep the data preparation at one place (in the make_standata function). I currently don't want to split it between R and Stan, but I will consider it in the future.

The workflow when omiting brm could be as follows:

stancode <- make_stancode(<arguments>)
standata <- make_standata(<arguments>)
< some changes to stancode and standata if desired >
fit <- stan(model_code = stancode, data = standata, <more arguments>)

The code to center X is just

X_means <- colMeans(X)
X <- sweep(X, 2, X_means, FUN = "-")

Also, you can omit the centering by writing formula = <response> ~ -1 + intercept + <predictors> instead of just formula = <response> ~ <predictors>.

In the Stan code you provided, there is an unnecessary loop

for (n in 1:N) {
eta[n] <- (eta[n]);
} 

that will be removed in the next version of brms (just in case you wonder).

waynefoltaERI commented 8 years ago

Good points, and again thanks for such a great package!

I'd offer the viewpoint that there are two ways to centralize things: 1) in standata, and 2) in the Stan code. The first (current) option will always be asymmetrical: things are done in standata that need to be reversed in the Stan generated quantities. So there will always be a bit of a surprise when looking at the Stan code, and using the Stan code on new data will require a pass through standata, which will require brms.

Also, including (implicit) transformations of data in the Stan code allows the code to be a lowest-common-denominator communication of the model. You could ship the Stan code to anyone capable of running Stan (command line, Python, rstan without brms, etc) and it would work.

Obviously, explicit data transformation by the user (say taking logs, squaring variables, other feature engineering) should not go into the Stan model unless a user wants to insert them for some reason. But what I would call "implicit" transformations -- things done by you as the brms author and Stan expert in order to help Stan run better and to give faster and more reliable results -- I'd argue should show up in the Stan code itself, since these transformations will usually be done (transformed data) and undone (generated quantities) and a brms user need not know they are even happening under the hood.

Just another perspective.

(P.S. I'd figured the for loop was an artifact of the compilation process.)

paul-buerkner commented 8 years ago

Makes sense. I hadn't thought about it this way.

Let's see what needs to be put into transformed data (besides centralizing X), maybe I can get this implemented in the next version.

jacksutton commented 7 years ago

I would like to second Wayne's suggestion, and suggest further that some options for centering and scaling might be built into brms (easy for me to say--perhaps a '' option to accommodate precentered data). I have two reasons. One is I guess aesthetic: I generally follow Gelman's preference to center binary predictors and standardize/2 continuous predictors. This puts everything on roughly the same scale, so coefficients are easily interpretable and make for tidy tables. Perhaps not all that important. The second reason has to do with multilevel models with cross-level interactions. Say you have data on N individuals in J clusters. Best practice as I understand it is to center (or standardize) cluster-level predictors using J observations, and this is easy to do using Stan, using two input datasets. But with brms (like lme4), those J observations are distributed among the N individuals. If the clusters are perfectly balanced the mean of var[n] will be the same as the mean of var[j], I guess; if not, not. This will then shift the intercept, which with centered data is an estimate of the adjusted mean.

I must add: brms is a terrific package! Thanks!

paul-buerkner commented 7 years ago

In brms 1.0.0 (which has been released today), I changed quite a few things, of which some previously kept me from performing the centering inside the Stan code. With these changes in place, this has a good chance of getting into the next release. I will also think about adding additional options for centering and scaling.

paul-buerkner commented 7 years ago

@waynefoltaERI After quite some time (and after some changes in brms 1.0.0 that made it possible) the design matrix is finally centered inside the Stan code. I leave this issue open for now to remined myself of searching for other stuff that should be moved to the Stan code instead of being part of the data preparation in R.

jacksutton commented 7 years ago

Great! I look forward to taking brms 1.0.0 for a ride!

jacksuttonphoto.com

On Fri, Sep 16, 2016 at 1:24 AM, Paul-Christian Bürkner < notifications@github.com> wrote:

@waynefoltaERI https://github.com/waynefoltaERI After quite some time (and after some changes in brms 1.0.0 that made it possible) the design matrix is finally centered inside the Stan code. I leave this issue open for now to remined myself of searching for other stuff that should be moved to the Stan code instead of being part of the data preparation in R.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/50#issuecomment-247545537, or mute the thread https://github.com/notifications/unsubscribe-auth/AVCRxUjVoNfhNgp67-EBvXaWFEvS4YhNks5qqlI7gaJpZM4HvVjy .

paul-buerkner commented 7 years ago

I made additional minor changes to make the Stan code more self-contained and I think this issue can be close now.

@jacksutton I agree that the variable transformations you mentioned are often sensible. However, I hesitate implementing them in brms, as it adds another layer of complexity (i.e. changing the meaning of the data inside the package), which would cause all sort of trouble in particular when calling predict and related methods with new data. From my perspective, these kinds of transformations are better done by the user before passing stuff to brms or other model fitting functions.