Open LisaHuelsmann opened 3 years ago
Hi Lisa,
I am not entirely getting what you are trying to do. Note that here qgam behaves in the same way as mgcv::gam, so probably this would require changing mgcv. I don't understand why you want to use a data set where an observation is repeated several times and then you want to down weight the repeated observations. Can't you just remove the duplicates? Obviously I am missing something!
I have the problem as well, that I can use weights in mgcv but not in qgam. In my case I am using survey weights. If helpfull, I can give yu a reprex
Hi, yes please, a (minimal) reprex would be helpful.
Hello, sorry it took me so long to reply. Here is an example. Data simulation is not my strong suit, but it should show the problem. Normally we have survey weights, for example younger people are weighted higher there than older people, because they participate less often in surveys. Accordingly, there are different results.
library(tidyverse)
library(mgcv)
library(qgam)
n <- 3000
data <- tribble(~price, ~group,
rbeta(n,20,2), "middle",
rbeta(n,40,1), "new",
rbeta(n,10,7), "old"
)%>%
mutate(age = map(.x = group, ~case_when(
.x == "old" ~ rbinom(n, size = 10, prob = .5),
.x == "middle" ~ rbinom(n, size = 5, prob = .3),
.x == "new" ~ rbinom(n, size = 4, prob = .2)
)))%>%
mutate(price = map(.x = price, ~sort(.x, )))%>%
mutate(age = map(.x = age, ~sort(.x, decreasing = T)))%>%
unnest()%>%
mutate(price = round(price*10, 2))%>%
mutate(weight = map_dbl(.x = group, ~case_when(
.x == "old" ~ abs(rnorm(1, mean = 2, sd = .5)),
.x == "middle" ~ 1,
.x == "new" ~ abs(rnorm(1, mean = 1, sd = .5))
)))%>%
mutate(weight = case_when(
price < 4 & age > 8 ~ weight*2,
price > 9 & age > 3 ~ weight/2,
price > 5 & age > 3 ~ weight*2,
price > 10 & age > 2 ~ weight*2,
T ~ weight
))%>%
mutate(group = as.factor(group))
data%>%
ggplot(aes(x = age, y = price, col = group, group = group))+
geom_smooth(method = "loess")
gam(price ~ s(age, by = group) + group, data = data) -> gam_wo_weight
gam(price ~ s(age, by = group) + group, data = data, weights = weight) -> gam_w_weight
qgam(price ~ s(age, by = group) + group, data = data, qu = .5) -> qgam_wo_weight
#not working
#qgam(price ~ s(age, by = group) + group, data = data, weights = weight, .5) -> qgam_w_weight
library(ggeffects)
#library(gratia)
library(patchwork)
ggpredict(gam_wo_weight, terms = c("age", "group"))%>%
plot()+
scale_y_continuous(limits = c(-2, 12))+
labs(title = "gam without weights")+
ggpredict(gam_w_weight, terms = c("age", "group"))%>%
plot()+
scale_y_continuous(limits = c(-2, 12))+
labs(title = "gam with weights")+
ggpredict(qgam_wo_weight, terms = c("age", "group"))%>%
plot()+
scale_y_continuous(limits = c(-2, 12))+
labs(title = "qgam without weights")+
plot_layout(guides = 'collect') &
theme(legend.position = "bottom")
sjPlot::tab_model(gam_wo_weight, gam_w_weight, qgam_wo_weight,
dv.labels = c("gam without weights", "gam with weights", "qgam without weights"))
Hi Matteo, I am running a two-step analysis:
Now, I am struggling with how I can accurately propagate the uncertainty in these estimates. Option 1 would be to weight with 1/var, but I don't think this is accurate in a quantile setting. Option 2 would be to use the distribution of the posterior samples of the average marginal effects directly that I simulated to get the variances. So I tested this with qgam but then I realized that qgam doesnt downweight the repeated samples per average marginal effect when using weight = 1/nsamples. I tested this in glmmTMB, where the weights are directly acting on the likelihood, which would be what I would need for qgam too.
Any idea what I could do instead? Thanks for you help...
Here an illustrative example of what I am looking for:
I simply repeat each observation 10 times in the second dataset.
My hope was that both models result in the same effective sample size, but apperently they don't and the CI are more narrow for mod1.
For glmmTMB, the downweighting works and both models produce the same result.