jacob-long / panelr

Regression models and utilities for repeated measures and panel data
Other
100 stars 21 forks source link

Weights argument not working consistently in wbm; not at all in wbm_stan #52

Open ahcombs opened 1 year ago

ahcombs commented 1 year ago

I need to model some data with survey weights, and it seems like the weights arguments in wbm and (especially) wbm_stan are not consistently working correctly. Different errors are thrown when estimating models depending on the function and on how weights are provided. Providing weights as a column name in string form does not work for wbm. Nothing that I've tried so far works for wbm_stan. I have reproduced the issue in the code below.

I am using R version 4.1.3 and panelr version 0.7.6.

I've been loving this package otherwise--the usage is straightforward and documentation is great. Thanks for your work!

library(tidyverse)
library(panelr)
set.seed(1)

# R.version
# version("panelr")

# generate a data set with two variables over two waves
df <- tibble(
  id = seq(1:100),
  v1_w1 = rnorm(100, 10, 1),
  v1_w2 = v1_w1 + rnorm(100, 0, 1),
  v2_w1 = v1_w1 + rnorm(100, 0, 1),
  v2_w2 = v2_w1/2 + v1_w2/2 + rnorm(100, 0, 1),
  # weights are positive values with mean of close to but not exactly 1 (mimicking my real data)
  wt = runif(100, .1, 1.9)
) %>% 
  pivot_longer(cols = starts_with("v"), names_pattern = "(v\\d)_w(.)", names_to = c(".value", "wave")) %>% 
  mutate(wave = factor(wave, ordered = TRUE))

# a normal wbm model without weights works
m1 <- wbm(v2 ~ v1, 
          data = df, 
          id = "id",
          wave = "wave")
# summary(m1)

# according to documentation:   
# weights: If using weights, either the name of the column in the data 
# that contains the weights or a vector of the weights.

### in wbm:

# name of column as a string does not work
# Error in is.nloptr(ret) : objective in x0 returns NA
# In addition: Warning message:
#   In callSuper(...) : NAs introduced by coercion
m2 <- wbm(v2 ~ v1, 
          data = df, 
          id = "id",
          wave = "wave", 
          weights = "wt")

# name of column as an object DOES work
# but with a warning in summary output:
# Warning messages:
#   1: In class(object) <- "environment" :
#   Setting class(x) to "environment" sets attribute to NULL; result will no longer be an S4 object
# 2: In class(object) <- "environment" :
#   Setting class(x) to "environment" sets attribute to NULL; result will no longer be an S4 object
m3 <- wbm(v2 ~ v1, 
          data = df, 
          id = "id",
          wave = "wave", 
          weights = wt)

# summary(m3)

# vector does work
m4 <- wbm(v2 ~ v1, 
          data = df, 
          id = "id",
          wave = "wave", 
          weights = df$wt)

# summary(m4)

### in wbm_stan:

# name of column as a string does not work
# Error: The following variables can neither be found in 'data' nor in 'data2':
#   'wt'
# In addition: Warning message:
#   In sub(lhs, new_lhs, as.character(deparse(fin_formula)), fixed = TRUE) :
#   argument 'replacement' has length > 1 and only the first element will be used
m5 <- wbm_stan(v2 ~ v1, 
               data = df, 
               id = "id",
               wave = "wave", 
               weights = "wt")

# name of column as an object does not work
# Error: The following variables can neither be found in 'data' nor in 'data2':
#   'X1.64563741534948'
# In addition: Warning message:
#   In sub(lhs, new_lhs, as.character(deparse(fin_formula)), fixed = TRUE) :
#   argument 'replacement' has length > 1 and only the first element will be used
m6 <- wbm_stan(v2 ~ v1, 
               data = df, 
               id = "id",
               wave = "wave", 
               weights = wt)

# vector does not work
# the value after X in the error message is the weight value in row 1 of the data set
# Error: The following variables can neither be found in 'data' nor in 'data2':
#   'X1.64563741534948'
# In addition: Warning message:
#   In sub(lhs, new_lhs, as.character(deparse(fin_formula)), fixed = TRUE) :
#   argument 'replacement' has length > 1 and only the first element will be used
m7 <- wbm_stan(v2 ~ v1, 
          data = df, 
          id = "id",
          wave = "wave", 
          weights = df$wt)
jacob-long commented 1 year ago

Thanks for the report. For the moment, as I suppose is obvious given the behavior of the package, survey weights are not "officially" supported. There are some complexities with regard to the best way to use sampling weights with multilevel models, both computationally and conceptually. Right now, the weights are being passed on to lme4 but errors may be coming from various origins like the pre-processing functions in panelr removing or changing the weights column. I'll look into this again because I wanted to offer at least some level of support for survey weights and at the time panelr was first developed, there was an in-progress package being made for fitting multilevel models with survey weights. I'll check in on that or otherwise give users better feedback when they try to use weights.

ahcombs commented 1 year ago

Hi Jacob,Thanks so much for your response, and for the heads up about the complexities involved. I looked into it a little further after posting this and it seems to me like the error is arising in the panelr preprocessing steps—specifically, the quoting/unquoting of the column name appears to not be working correctly, yielding in some cases a column where all the values are the column name string (I don’t have line numbers in front of me at the moment, apologies). I couldn’t quickly find the fix, but it seemed like something that would probably be simple for someone more familiar with quosures etc than me. Best wishes,AidanOn Jan 12, 2023, at 9:38 AM, Jacob Long @.***> wrote: Thanks for the report. For the moment, as I suppose is obvious given the behavior of the package, survey weights are not "officially" supported. There are some complexities with regard to the best way to use sampling weights with multilevel models, both computationally and conceptually. Right now, the weights are being passed on to lme4 but errors may be coming from various origins like the pre-processing functions in panelr removing or changing the weights column. I'll look into this again because I wanted to offer at least some level of support for survey weights and at the time panelr was first developed, there was an in-progress package being made for fitting multilevel models with survey weights. I'll check in on that or otherwise give users better feedback when they try to use weights.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>