stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
382 stars 132 forks source link

allow sigma to vary by group #34

Open jgabry opened 8 years ago

jgabry commented 8 years ago

For models with a Gaussian outcome and some form of grouping structure, allow sigma to vary by level of a group.

avehtari commented 8 years ago

Why only a Gaussian outcome? Why not all observation models with scale parameters?

jgabry commented 8 years ago

At the time the issue was opened I think only Gaussian was enabled. But yeah, we could do this for other distributions with scale parameters too.

On Wednesday, February 3, 2016, Aki Vehtari notifications@github.com wrote:

Why only a Gaussian outcome? Why not all observation models with scale parameters?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/34#issuecomment-179359633.

jgabry commented 8 years ago

Before we can get this in rstanarm we need a new syntax for specifying this feature of a model.

What about an optional formula argument for sigma? For instance,

stan_glm(y ~ x1 + x2, sigma = ~ group)

On Wednesday, February 3, 2016, Jonah Sol Gabry jgabry@gmail.com wrote:

At the time the issue was opened I think only Gaussian was enabled. But yeah, we could do this for other distributions with scale parameters too.

On Wednesday, February 3, 2016, Aki Vehtari <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Why only a Gaussian outcome? Why not all observation models with scale parameters?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/34#issuecomment-179359633.

andrewgelman commented 8 years ago

Are you saying here that the variance parameter will vary by group? In which case maybe we want something like:

stan_glm(formula1, formula2, …) so that it would look like stan_glm(y ~ x1 + x2, sigma ~ (1 | group))

On Feb 5, 2016, at 1:18 AM, Jonah Gabry notifications@github.com wrote:

Before we can get this in rstanarm we need a new syntax for specifying this feature of a model.

What about an optional formula argument for sigma? For instance,

stan_glm(y ~ x1 + x2, sigma = ~ group)

On Wednesday, February 3, 2016, Jonah Sol Gabry jgabry@gmail.com wrote:

At the time the issue was opened I think only Gaussian was enabled. But yeah, we could do this for other distributions with scale parameters too.

On Wednesday, February 3, 2016, Aki Vehtari <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Why only a Gaussian outcome? Why not all observation models with scale parameters?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/34#issuecomment-179359633.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/34#issuecomment-180217805.

jgabry commented 8 years ago

Yeah exactly. Something like that.

On Fri, Feb 5, 2016 at 1:20 AM, Andrew Gelman notifications@github.com wrote:

Are you saying here that the variance parameter will vary by group? In which case maybe we want something like:

stan_glm(formula1, formula2, …) so that it would look like stan_glm(y ~ x1

  • x2, sigma ~ (1 | group))

On Feb 5, 2016, at 1:18 AM, Jonah Gabry notifications@github.com wrote:

Before we can get this in rstanarm we need a new syntax for specifying this feature of a model.

What about an optional formula argument for sigma? For instance,

stan_glm(y ~ x1 + x2, sigma = ~ group)

On Wednesday, February 3, 2016, Jonah Sol Gabry jgabry@gmail.com wrote:

At the time the issue was opened I think only Gaussian was enabled. But yeah, we could do this for other distributions with scale parameters too.

On Wednesday, February 3, 2016, Aki Vehtari <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Why only a Gaussian outcome? Why not all observation models with scale parameters?

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/34#issuecomment-179359633>.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/34#issuecomment-180217805>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/34#issuecomment-180218064.

avehtari commented 8 years ago

To prepare for the future, it's good to also think that there might be more output distribution families with other parameters than sigma. For example, people making survival analysis would love Weibull, then the parameters would be lambda (scale parameter) and k (shape parameter). Is is then easy to replace sigma with k and have stan_glm(y ~ x1 + x2, k ~ (1 | group), family=Weibull) ?

jgabry commented 8 years ago

Because the variable on the LHS of this optional formula will depend on the family, I think we should include it as an argument to rstanarm_family() rather than the modeling function (e.g. stan_glm itself).

We will be introducing rstanarm_family() in the next release as an alternative way of specifying the family that also allows the user to tweak family-specific hyperparameters in a cleaner way than in now possible. For example,

stan_glm(y ~ x, family = "gaussian")

will still be allowed and will use the default hyperparameters, whereas

stan_glm(y ~ x, family = rstanarm_family("gaussian", prior_scale_for_dispersion = 2))

allows the user to set the scale for the half-Cauchy prior on the dispersion (scale of the regression SE).

So we could also allow

rstanarm_family("gaussian", scale ~ (1|group)) 
rstanarm_family("t_family", scale ~ (1|group)) 
rstanarm_family("gamma", shape ~ (1|group))  
rstanarm_family("weibull", shape ~ (1|group))

etcetera.

flaxter commented 6 years ago

Just checking back--no updates on this, are there? I can of course just code it myself for my model in Stan, but I'm curious whether there are papers on this type of model and I'm not sure what to google (and eventually, cite for a paper). Do models with sigma ~ (1 | group) have a name?

paul-buerkner commented 6 years ago

They are so called distributional models or models of "location, scale, and shape". You may fit them with brms which also uses Stan at the backend (see https://cran.r-project.org/web/packages/brms/vignettes/brms_distreg.html for examples).

Jaapro commented 4 years ago

Has this feature been implemented in rstanarm yet?