saudiwin / ordbetareg_pack

Repository for R package ordbetareg, used to fit continuous data with lower and upper bounds.
Other
17 stars 3 forks source link

Default prior on phi #5

Closed bgall closed 2 years ago

bgall commented 2 years ago

Following up on a comment elsewhere, the default prior for $\phi$ seems informative in typical situations. Thinking about the survey researcher case, although other uses obviously exist for the model, researchers typically offer response scales for survey items that people only choose the extremes on. Usually they would redesign the item to attain more information by adding more response options or making the extremes middling values. Of course, sometimes that doesn't happen and you get extremely bimodal distributions. Yet even in unusual cases, there's still a fair amount of density in the center of the response scale. Perhaps a minimum of 30% of responses falling in (.1, .9) is a reasonable lower bound for what we'd expect given the typical mean response.

We don't really get even close to that unless we move from 0.1 to 0.4 (minimum) as a prior. Does this suggest 0.1 is an extremely informative prior rather than an uninformative one? While this ends up affecting the actual variance/clustering, the issue is that the current prior largely encourages bimodal distributions that avoid concentrated unimodal distributions by assuming people essentially only are using the extreme tails of the distribution.

library(FlexReg)
library(dplyr)
library(tidyr)
library(purrr)
set.seed(123)

# simulation parameters
phi <- c(0.1, 0.4, 3)
mu <- seq(0.1, 0.5, by = 0.1)
sims_n <- 10000
lower <- 0.1
upper <- 0.9

# simulate from mean-precision parameterized beta distributions
grid <- expand_grid(phi, mu)

sims <- map2(.y = grid$phi,
             .x = grid$mu,
             ~ rBeta_mu(n = sims_n, mu = .x, phi = .y))

# calculate density in specified interval
grid %>%
  mutate(pct = map_dbl(sims, ~ mean(ifelse(
    .x < upper & .x > lower, 1, 0
  )))) %>%
  round(2)
#> # A tibble: 15 x 3
#>      phi    mu   pct
#>    <dbl> <dbl> <dbl>
#>  1   0.1   0.1  0.04
#>  2   0.1   0.2  0.06
#>  3   0.1   0.3  0.08
#>  4   0.1   0.4  0.1 
#>  5   0.1   0.5  0.1 
#>  6   0.4   0.1  0.12
#>  7   0.4   0.2  0.2 
#>  8   0.4   0.3  0.27
#>  9   0.4   0.4  0.32
#> 10   0.4   0.5  0.32
#> 11   3     0.1  0.31
#> 12   3     0.2  0.57
#> 13   3     0.3  0.76
#> 14   3     0.4  0.86
#> 15   3     0.5  0.9
saudiwin commented 2 years ago

This is an issue with the way that Stan parameterizes the exponential function, which is 1/rate. So the expected value of the prior for $\phi$ is 10, not 0.1. That gives a 5% - 95% interval of (0.5, 30.8), which I think is much less informative. If you use the sample_prior="only" option you can see that the prior samples are approx (0.28, 35.9), or virtually identical to R's rexp function.

I will update the documentation to make this more clear. I don't think though that this default needs to be changed as it is quite wide. As I note elsewhere the docs, though, there isn't really such a thing as a weakly informative prior in all cases as it is possible for $\phi$ to be quite large.