n-kall / priorsense

priorsense: an R package for prior diagnostics and sensitivity
https://n-kall.github.io/priorsense/
GNU General Public License v3.0
53 stars 5 forks source link

Scaled prior partition for `brms::horseshoe()` and `brms::R2D2()` #21

Open fweber144 opened 1 year ago

fweber144 commented 1 year ago

I'm not sure if this is rather a brms issue/question or a priorsense one. If preferred, I could move this to brms's issue tracker.

The following refers to brms v2.20.1 (from CRAN).

When using the brms::horseshoe() prior, it seems like the priors for $\tau$ and $\tilde{c}^2$ are scaled (by $\tau$, I mean the same $\tau$ as in https://doi.org/10.1214/17-EJS1337SI; by $\tilde{c}^2$ I mean the unscaled $c^2$ from the same paper) because brms generates the Stan code lines

lprior += student_t_lpdf(hs_global | hs_df_global, 0, hs_scale_global)
          - 1 * log(0.5);
lprior += inv_gamma_lpdf(hs_slab | 0.5 * hs_df_slab, 0.5 * hs_df_slab);

where hs_global is $\tau$ and hs_slab is $\tilde{c}^2$. The $\lambda_j$ don't seem to be prior-scaled because in the brms-generated Stan code, we have

target += student_t_lpdf(hs_local | hs_df, 0, 1)
          - rows(hs_local) * log(0.5);

where hs_local is the vector of the $\lambda_j$.

When using the brms::R2D2() prior (by this, I mean the "ordinary" R2D2 prior, not the R2D2M2 prior), only the $R^2$ prior (not the one for $\phi$) seems to be scaled because in the brms-generated Stan code, we have

lprior += beta_lpdf(R2D2_R2 | R2D2_mean_R2 * R2D2_prec_R2, (1 - R2D2_mean_R2)
                                                           * R2D2_prec_R2);

and

target += dirichlet_lpdf(R2D2_phi | R2D2_cons_D2);

(for $\phi$, "R2D2_cons_D2 = [1, ..., 1]" would cause the $\phi$ Dirichlet prior to be invariant to power-scaling, but in general, the $\phi$ Dirichlet prior does not need to be flat and in particular, brms v2.20.1 has changed the default for brms::R2D2()'s argument cons_D2 from 1 to 0.5).

My question is whether these partitionings are on purpose and if yes, what their motivation is.

n-kall commented 1 year ago

Good questions, thanks for looking into this! I don't know the motivation for the horseshoe partitioning, but we did discuss the R2D2 and I think we concluded that the R2 prior is the most intuitive to check for power-scaling sensitivity, and should be the default. If both the Dirichlet and beta priors were power-scaled at the same time it might be confusing to understand. However, with the separate scaling feature in the separate_scaling branch, we could likely avoid having to manually choose the partitioning, and instead save all relevant log prior density evaluations in an array, and the user (or some automated method) could selectively power-scale them.

fweber144 commented 1 year ago

Sounds good, thank you :)