mfasiolo / qgam

Additive quantile regression R package
http://mfasiolo.github.io/qgam/
30 stars 7 forks source link

weights in qgam #39

Open LisaHuelsmann opened 3 years ago

LisaHuelsmann commented 3 years ago

Hi Matteo, I am running a two-step analysis:

  1. I estimate average marginal effects (i.e. derivatives at the response scale) from several binomial model (fitted with mgcv::gam()).
  2. I want to estimate how these average marginal effects are affected by two predictors. The distribution of the average marginal effects has very heavy tails, so I thought quantile regression (i.e. qgam with qu = 0.5) may be a good option.

Now, I am struggling with how I can accurately propagate the uncertainty in these estimates. Option 1 would be to weight with 1/var, but I don't think this is accurate in a quantile setting. Option 2 would be to use the distribution of the posterior samples of the average marginal effects directly that I simulated to get the variances. So I tested this with qgam but then I realized that qgam doesnt downweight the repeated samples per average marginal effect when using weight = 1/nsamples. I tested this in glmmTMB, where the weights are directly acting on the likelihood, which would be what I would need for qgam too.

Any idea what I could do instead? Thanks for you help...

Here an illustrative example of what I am looking for:

# data generation ---------------------------------------------------------
set.seed(1)
dat = mgcv::gamSim(1, n = 30)
dat1 = dat[rep(1:nrow(dat), each = 10), ]

I simply repeat each observation 10 times in the second dataset.

# qgam --------------------------------------------------------------------

mod = qgam(y ~ x0 + x1 + x2
           , data = dat
           , qu = 0.5
           )
mod1 = qgam(y ~ x0 + x1 + x2
          , data = dat1
          , qu = 0.5
          , argGam = list(weights = rep(1/10, nrow(dat1)))
          )
summary(mod)
summary(mod1)

My hope was that both models result in the same effective sample size, but apperently they don't and the CI are more narrow for mod1.

# glmmTMB -----------------------------------------------------------------

mod = glmmTMB(y ~ x0 + x1 + x2
              , data = dat
)
mod1 = glmmTMB(y ~ x0 + x1 + x2
               , data = dat1
               , weights = rep(1/10,  nrow(dat1)) 
)
summary(mod)
summary(mod1)

For glmmTMB, the downweighting works and both models produce the same result.

mfasiolo commented 3 years ago

Hi Lisa,

I am not entirely getting what you are trying to do. Note that here qgam behaves in the same way as mgcv::gam, so probably this would require changing mgcv. I don't understand why you want to use a data set where an observation is repeated several times and then you want to down weight the repeated observations. Can't you just remove the duplicates? Obviously I am missing something!

Waschoi commented 1 year ago

I have the problem as well, that I can use weights in mgcv but not in qgam. In my case I am using survey weights. If helpfull, I can give yu a reprex

mfasiolo commented 1 year ago

Hi, yes please, a (minimal) reprex would be helpful.

Waschoi commented 1 year ago

Hello, sorry it took me so long to reply. Here is an example. Data simulation is not my strong suit, but it should show the problem. Normally we have survey weights, for example younger people are weighted higher there than older people, because they participate less often in surveys. Accordingly, there are different results.

library(tidyverse)
library(mgcv)
library(qgam)

n <- 3000
data <- tribble(~price, ~group,
                rbeta(n,20,2), "middle",
                rbeta(n,40,1), "new",
                rbeta(n,10,7), "old"
                     )%>%
  mutate(age = map(.x = group, ~case_when(
    .x == "old" ~ rbinom(n, size = 10, prob = .5),
    .x == "middle" ~ rbinom(n, size = 5, prob = .3),
    .x == "new" ~ rbinom(n, size = 4, prob = .2)
  )))%>%
  mutate(price = map(.x = price, ~sort(.x, )))%>%
  mutate(age = map(.x = age, ~sort(.x, decreasing = T)))%>%
  unnest()%>%
  mutate(price = round(price*10, 2))%>%
  mutate(weight = map_dbl(.x = group, ~case_when(
    .x == "old" ~ abs(rnorm(1, mean = 2, sd = .5)),
    .x == "middle" ~ 1,
    .x == "new" ~ abs(rnorm(1, mean = 1, sd = .5))
  )))%>%
  mutate(weight = case_when(
    price < 4 & age > 8 ~ weight*2,
    price > 9 & age > 3 ~ weight/2,
    price > 5 & age > 3 ~ weight*2,
    price > 10 & age > 2 ~ weight*2,
    T ~ weight
  ))%>%
  mutate(group = as.factor(group))

data%>%
  ggplot(aes(x = age, y = price, col = group, group = group))+
  geom_smooth(method = "loess")

gam(price ~ s(age, by = group) + group, data = data) -> gam_wo_weight
gam(price ~ s(age, by = group) + group, data = data, weights = weight) -> gam_w_weight

qgam(price ~ s(age, by = group) + group, data = data, qu = .5) -> qgam_wo_weight
#not working
#qgam(price ~ s(age, by = group) + group, data = data, weights = weight, .5) -> qgam_w_weight

library(ggeffects)
#library(gratia)
library(patchwork)

ggpredict(gam_wo_weight, terms = c("age", "group"))%>%
  plot()+
  scale_y_continuous(limits = c(-2, 12))+
  labs(title = "gam without weights")+
ggpredict(gam_w_weight, terms = c("age", "group"))%>%
  plot()+
  scale_y_continuous(limits = c(-2, 12))+
  labs(title = "gam with weights")+
ggpredict(qgam_wo_weight, terms = c("age", "group"))%>%
  plot()+
  scale_y_continuous(limits = c(-2, 12))+
  labs(title = "qgam without weights")+
  plot_layout(guides = 'collect') &
  theme(legend.position = "bottom")

sjPlot::tab_model(gam_wo_weight, gam_w_weight, qgam_wo_weight,
                  dv.labels = c("gam without weights", "gam with weights", "qgam without weights"))