paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.29k stars 187 forks source link

Rename the 'fitted' method #644

Closed paul-buerkner closed 4 years ago

paul-buerkner commented 5 years ago

I know that multiple people including @jgabry and @bgoodri dislike the name fitted used in brms to compute expected values of the posterior predictive distribution, and I agree. However, we still need a good new name to take the place of fitted. How does, for instance, posterior_expect sound to you? I am open to suggestions.

bgoodri commented 5 years ago

Do we want to put a S3 generic into rstantools?

On Mon, Apr 15, 2019 at 2:33 PM Paul-Christian Bürkner < notifications@github.com> wrote:

I know that multiple people including @jgabry https://github.com/jgabry and @bgoodri https://github.com/bgoodri dislike the name fitted used in brms to compute expected values of the posterior predictive distribution, and I agree. However, we still need a good new name to take the place of fitted. How does, for instance, posterior_expect sound to you? I am open to suggestions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqnp0IuwcmzY3tieMDxBAOkCtF69eks5vhMYCgaJpZM4cwn87 .

paul-buerkner commented 5 years ago

That would be ideal, yes.

bgoodri commented 5 years ago

I guess my main concern is that if it is called posterior_expect then it should actually be an expectation as opposed to evaluating the predictions at the expectation of the coefficients (in nonlinear or generalized linear models).

On Mon, Apr 15, 2019 at 5:11 PM Paul-Christian Bürkner < notifications@github.com> wrote:

That would be ideal, yes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644#issuecomment-483420689, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqrtXBBszeDiReDSNWHA4YPrKDmNoks5vhOsWgaJpZM4cwn87 .

paul-buerkner commented 5 years ago

What do you mean by "expectation of the coefficients"? It's not that fitted.brmsfit took the means of the coeffiecients and transformed them somehow, but I guess this is not what you ment.

In any case, I am open to other names, as long as we get one that can be used consistently across packages.

bgoodri commented 5 years ago

Right, what rstanarm is doing with fitted is using the posterior medians to get a prediction, which doesn't make a lot of sense in the Bayesian context but we needed something if people had code that was calling fitted or predict after glm(). So, if we put posterior_expect into rstantools and wrote methods for it, it should be an actual expectation. But I don't know if posterior_expectation really conveys that it is the expectation of the posterior predictive distribution or different say than what get_posterior_mean does in rstan.

On Mon, Apr 15, 2019 at 5:25 PM Paul-Christian Bürkner < notifications@github.com> wrote:

What do you mean by "expectation of the coefficients"? It's not that fitted.brmsfit took the means of the coeffiecients and transformed them somehow, but I guess this is not what you ment.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644#issuecomment-483424985, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqvlVzFTXb4hE0vIm6v1WvhZLkCLOks5vhO5FgaJpZM4cwn87 .

paul-buerkner commented 5 years ago

What about pp_expect or pp_expectation? We are using pp as abbreviation for posterior predictions in other places anyway.

bgoodri commented 5 years ago

That could work

On Mon, Apr 15, 2019 at 5:36 PM Paul-Christian Bürkner < notifications@github.com> wrote:

What about pp_expect or pp_expectation. We are using pp as abbreviation for posterior predictions in other places anyway.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644#issuecomment-483428051, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqs1gkrLi_JIyid0SYytM9le6P3crks5vhPDBgaJpZM4cwn87 .

paul-buerkner commented 5 years ago

Great! Of those two, I would prefer pp_expect, I think. @jgabry what is your opinion on this?

jgabry commented 5 years ago

Is this just colMeans(posterior_predict(fit))? If so I’m not sure it needs a method itself, but I’m not opposed to it and I’m fine with that name.

paul-buerkner commented 5 years ago

It's like posterior_linpred(..., transform = TRUE) so it should be easy to make a method in rstanarm as well.

jgabry commented 5 years ago

Oh I see. Yeah I’m fine with that. Can you open an rstantools issue? Or feel free to just add it to rstantools if if you want.

paul-buerkner commented 5 years ago

Sure. I guess I just make a PR for the generic.

statwonk commented 5 years ago

I think the lognormal might be an example of what @bgoodri means. mu is the median not the expected value.

paul-buerkner commented 5 years ago

yes but fitted is designed to always return the mean (see the corresponding code on GitHub in fitted.R).

Christopher Peters notifications@github.com schrieb am So., 28. Apr. 2019, 23:04:

I think the lognormal might be an example of what @bgoodri https://github.com/bgoodri means. mu is the median not the expected value.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644#issuecomment-487411011, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCW2ADTCAWCDCYSGYM3SV3PSX7NPANCNFSM4HGCP45Q .

elliottmorris commented 4 years ago

Hi @paul-buerkner.

Sorry, but after these changes I cannot figure out how to recreate the old output from posterior_linpred using pp_expect.

Specifically, I used to run:

draws <- posterior_linpred(model,
    newdata=df,
    draws=1000,
    allow_new_levels=T,
    transform=T)

But the following is not yielding the same results:

draws <- brms::pp_expect(model,
    newdata=df,
    nsamples=1000,
    allow_new_levels =T,
    scale='linear')

I'd be grateful for any help you can provide.

E

paul-buerkner commented 4 years ago

You could still use posterior_linpred or not?

G. Elliott Morris notifications@github.com schrieb am Fr., 24. Jan. 2020, 19:36:

Hi @paul-buerkner https://github.com/paul-buerkner.

Sorry, but after these changes I cannot figure out how to recreate the old output from posterior_linpred using ppc_expect.

Specifically, I used to run:

draws <- posterior_linpred(model, newdata=df, draws=1000, allow_new_levels=T, transform=T)

But the following is not yielding the same results:

draws <- brms::pp_expect(model, newdata=df, nsamples=1000, allow_new_levels =T, scale='linear')

I'd be grateful for any help you can provide.

E

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644?email_source=notifications&email_token=ADCW2AFJUTM3GR3XPNXKKBTQ7MRJ3A5CNFSM4HGCP452YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ3QZZI#issuecomment-578227429, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AHR2DFFZ3V4UEEQEPDQ7MRJ3ANCNFSM4HGCP45Q .

elliottmorris commented 4 years ago

You're saying I shouldn't make the switch over? Warnings say it's deprecated right?

Either way, shouldn't I be able to recreate posterior predictions for the linear predictor with the new function?

paul-buerkner commented 4 years ago

Oh posterior_linpred shouldn't be deprecated. I will check it out later on. Do you have a reprex for the non equal behavior?

G. Elliott Morris notifications@github.com schrieb am Fr., 24. Jan. 2020, 19:43:

You're saying I shouldn't make the switch over? Warnings say it's deprecated right?

Either way, shouldn't I be able to recreate posterior predictions for the linear predictor with the new function?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644?email_source=notifications&email_token=ADCW2ACRGKTHI2JDANMQOH3Q7MSFRA5CNFSM4HGCP452YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ3RN3I#issuecomment-578229997, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AABA53ZZTSWSLRQTNLQ7MSFRANCNFSM4HGCP45Q .

elliottmorris commented 4 years ago

I will put one together if by next week I still can't replicate across models. Thanks Paul

elliottmorris commented 4 years ago

Yes it seems that the posterior_linpred function has changed pretty dramatically over the past few months.

It used to return a draw of the linear predictor for every specified run. Now it looks to be returning a draw from the distribution of the average of the linear predictor. That's pretty different functionality.

What am I missing? Are these changes documented somewhere?

paul-buerkner commented 4 years ago

If that is the case that this is unintended. It should have stayed the same. Once I am home tonight I will take look.

G. Elliott Morris notifications@github.com schrieb am Fr., 24. Jan. 2020, 20:44:

Yes it seems that the posterior_linpred function has changed pretty dramatically over the past few months.

It used to return a draw of the linear predictor for every specified run. Now it looks to be returning a draw from the distribution of the average of the linear predictor. That's pretty different functionality.

What am I missing? Are these changes documented somewhere?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644?email_source=notifications&email_token=ADCW2ACHPBLZNUYAIAI4MJLQ7MZHFA5CNFSM4HGCP452YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ3WYFQ#issuecomment-578251798, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AAUS45O4IWEVIIOJPDQ7MZHFANCNFSM4HGCP45Q .

elliottmorris commented 4 years ago

Excellent. Thanks again.

elliottmorris commented 4 years ago

Sorry just an FYI again. It seems like the behavior changed between 2.9 and 2.10, not 2.11.

paul-buerkner commented 4 years ago

pp_expect(..., scale = "linear") does not work because pp_expect is supposed to always return the expected value of the posterior predictive distribution and thus has no (user facing) scale argument. Existing functions contiue to work in the same way. For instance,

posterior_linpred(model)
fitted(model, scale = "linear", summary = FALSE)

yield the same results.

elliottmorris commented 4 years ago

Okay, that’s useful context.  Thanks.

The functionality of posterior_linpred() still seems to have changed significantly between brms 2.9 and 2.10-11. I’ll put together a reprex, unless you know why off the top of your head?

(Again, the difference I’m seeing looks to be a much smaller range of predictions for each observation in newdata, or something to that effect.) On Jan 25, 2020, 06:31 -0500, Paul-Christian Bürkner notifications@github.com, wrote:

pp_expect(..., scale = "linear") does not work because pp_expect is supposed to always return the expected value of the posterior predictive distribution and thus has no (user facing) scale argument. Existing functions contiue to work in the same way. For instance, posterior_linpred(model) fitted(model, scale = "linear", summary = FALSE) yield the same results. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

paul-buerkner commented 4 years ago

I am not entirely sure what you mean by the change of functionality unfortunately. It may have changed for ordinal and categorical models where is wasnt correct in some instances before. But for other models it should have stayed same. I dont want to speculate so please provide a minimal reproducible example

G. Elliott Morris notifications@github.com schrieb am Sa., 25. Jan. 2020, 19:33:

Okay, that’s useful context. Thanks.

The functionality of posterior_linpred() still seems to have changed significantly between brms 2.9 and 2.10-11. I’ll put together a reprex, unless you know why off the top of your head?

(Again, the difference I’m seeing looks to be a much smaller range of predictions for each observation in newdata, or something to that effect.) On Jan 25, 2020, 06:31 -0500, Paul-Christian Bürkner < notifications@github.com>, wrote:

pp_expect(..., scale = "linear") does not work because pp_expect is supposed to always return the expected value of the posterior predictive distribution and thus has no (user facing) scale argument. Existing functions contiue to work in the same way. For instance, posterior_linpred(model) fitted(model, scale = "linear", summary = FALSE) yield the same results. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/644?email_source=notifications&email_token=ADCW2AG357PUTWLPMQ3HY7LQ7RZV3A5CNFSM4HGCP452YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ5BGYA#issuecomment-578425696, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AGBDLSZQJ5LNUH6L53Q7RZV3ANCNFSM4HGCP45Q .

elliottmorris commented 4 years ago

Yes, the issue I'm having is with categorical models.

I'm pasting some code below and attaching a zip for the example. (The typical reprex::reprex() approach was having issues loading different brms versions).

You can see that the outcome has changed. I just can't figure out how to recreate the old predictions for the linear predictor. These new ones are wrong.

Thanks again for all the help and dev!

Restarting R session...

> library(tidyverse)
── Attaching packages ────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.2.1     ✓ purrr   0.3.3
✓ tibble  2.1.3     ✓ dplyr   0.8.3
✓ tidyr   1.0.0     ✓ stringr 1.4.0
✓ readr   1.3.1     ✓ forcats 0.4.0
── Conflicts ───────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
> library(brms)
Loading required package: Rcpp
Loading 'brms' package (version 2.10.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

> # read in generated data files 
> targets <- read_rds('targets.rds')
> cces <- read_rds('sample_cces.rds')
> # just read it in if already run
> vote_model <- read_rds("vote_model.rds")

> # draws
> pred <- posterior_linpred(object=vote_model,
+                           newdata=targets,
+                           allow_new_levels=TRUE,
+                           nsamples=500,
+                           transform = TRUE)

> # look at range of preds for clinton
> # they go from roughly 0 to 1 in brms 2.10.0 but much smaller range in brms 2.11.0
> summary(colMeans(pred[,,1]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.03755 0.33496 0.57316 0.56024 0.80095 0.99251 

> # now compare colmeans to 2.11 -----
> detach("package:brms", unload=TRUE)
> remove.packages('brms')
Removing package from ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
(as ‘lib’ is unspecified)
> devtools::install_version("brms", version = "2.11.0", repos = "http://cran.us.r-project.org")
Downloading package from url: http://cran.us.r-project.org/src/contrib/Archive/brms/brms_2.11.0.tar.gz
* installing *source* package ‘brms’ ...
* DONE (brms)
> library(brms)
Loading required package: Rcpp
Loading 'brms' package (version 2.11.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

> # draws
> pred <- posterior_linpred(object=vote_model,
+                           newdata=targets,
+                           allow_new_levels=TRUE,
+                           nsamples=500,
+                           transform = TRUE)
Warning message:
posterior_linpred(transform = TRUE) is deprecated.Please use pp_expect() instead. 

> # look at range of preds for clinton
> # they go from roughly 0 to 1 in brms 2.10.0 but much smaller range in brms 2.11.0
> summary(colMeans(pred[,,1]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2838  0.4708  0.5115  0.4965  0.5371  0.5515 

EXAMPLE ZIP: paul_reprex.zip

paul-buerkner commented 4 years ago

Thank you for providing a reproducible example. I don't know what caused the issue but it seems to be working correctly again in the latest github version of brms (2.11.4).

elliottmorris commented 4 years ago

How interesting. Well, thanks! On Jan 27, 2020, 04:18 -0500, Paul-Christian Bürkner notifications@github.com, wrote:

Thank you for providing a reproducible example. I don't know what caused the issue but it seems to be working correctly again in the latest github version of brms (2.11.4). — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.