.se_type_1, .se_type_2 hard coded?

acoppock commented 10 months ago

Thanks to the team for all the work on this project and package. In projoint_level, I think that the values of .se_type_1 and .se_type_2 aren't being passed to pj_estimate, which seems to hard code both values as "classical".

as a result, the two calls below yield the same standard error estimates.

data("exampleData1")

outcomes <- paste0("choice", seq(from = 1, to = 8, by = 1))
outcomes <- c(outcomes, "choice1_repeated_flipped")

reshaped_data <- reshape_projoint(
  .dataframe = exampleData1, 
  .outcomes = outcomes)

summary(projoint(reshaped_data, .se_type_2 = "classical"))
summary(projoint(reshaped_data, .se_type_2 = "CR2"))

yhoriuchi commented 10 months ago

Thank you, @acoppock, for noting this issue. This is due to my lack of sufficient experience in package development. I set defaults in some internal functions used within projoint(). I believe I fixed this problem. See: https://github.com/yhoriuchi/projoint/commit/5f0ad3718b06120deaab380ddca585eea54d4101 https://github.com/yhoriuchi/projoint/commit/1ac1818a582c557552d45be307dbb1191edf17d1

But I still see the same standard errors by running the following code:

library(projoint)

data("exampleData1")

outcomes <- paste0("choice", seq(from = 1, to = 8, by = 1))
outcomes <- c(outcomes, "choice1_repeated_flipped")

reshaped_data <- reshape_projoint(
  .dataframe = exampleData1, 
  .outcomes = outcomes)

summary(projoint(reshaped_data))
summary(projoint(reshaped_data, .se_type_2 = "stata"))

If you added .se_type_2 = "CR2", then you would see an error message:

> summary(projoint(reshaped_data, .se_type_2 = "CR2"))
Error in check_se_type(se_type, clustered) : 
  `se_type` must be either 'HC0', 'HC1', 'stata', 'HC2', 'HC3', 'classical' or 'none' with no `clusters`.
You passed: CR2 which is reserved for a case with clusters.

This is because the default for .clusters_2 is NULL. This error message also suggests that the hand-coded issue was resolved.

yhoriuchi commented 10 months ago

@acoppock , can you also try the following script?

library(projoint)
library(tidyverse)
library(estimatr)

data("exampleData1")

outcomes <- paste0("choice", seq(from = 1, to = 8, by = 1))
outcomes <- c(outcomes, "choice1_repeated_flipped")

reshaped_data <- reshape_projoint(
  .dataframe = exampleData1, 
  .outcomes = outcomes)

cjdata <- reshaped_data@data %>% 
  select(id, task, profile, contains("att"), selected) %>% 
  pivot_longer(cols = contains("att"), 
               names_to = "attribute", 
               values_to = "attribute_level") %>% 
  mutate(attribute_level = as.character(attribute_level))

out1a <- projoint(reshaped_data, .remove_ties = FALSE) %>% 
  summary() %>% 
  filter(estimand == "mm_uncorrected") %>% 
  select("attribute_level" = att_level_choose, 
         "projoint_classical" = se)

out1b <- projoint(reshaped_data, .remove_ties = FALSE, .se_type_2 = "stata") %>% 
  summary() %>% 
  filter(estimand == "mm_uncorrected") %>% 
  select("attribute_level" = att_level_choose, 
         "projoint_stata" = se) 

out2a <- cjdata %>% 
  group_by(attribute_level) %>% 
  reframe(tidy(lm_robust(selected ~ 1, se_type = "classical", data = pick(everything())))) %>% 
  select(attribute_level, 
         "lm_robust_classical" = std.error) 

out2b <- cjdata %>% 
  group_by(attribute_level) %>% 
  reframe(tidy(lm_robust(selected ~ 1, se_type = "stata", data = pick(everything())))) %>% 
  select(attribute_level, 
         "lm_robust_stata" = std.error) 

out <- out1a %>% 
  left_join(out1b, by = "attribute_level") %>% 
  left_join(out2a, by = "attribute_level") %>% 
  left_join(out2b, by = "attribute_level")

All of these methods produce the same standard errors.

acoppock commented 10 months ago

Thank you for looking into this @yhoriuchi !

On the clusters issue, I had assumed you would be clustering standard errors at the respondent level ("id") -- I believe that's standard practice? I have heard the argument that we don't need to cluster because the random assignment is at the profile level. However, following the rule to "cluster at the level of sampling or assignment, whichever is higher" would suggest clustering at the respondent level.

All that to say, I would have thought the default for .clusters_2 would have been to pass the id variable.

by the way, I'm not sure how to correctly pass the id variable to clusters_2, none of the following work just yet.

summary(projoint(reshaped_data, .clusters_2 = id, .se_type_2 = "CR2"))
summary(projoint(reshaped_data, .clusters_2 = "id", .se_type_2 = "CR2"))
summary(projoint(reshaped_data, .clusters_2 = reshaped_data@data$id, .se_type_2 = "CR2"))

yhoriuchi / projoint

.se_type_1, .se_type_2 hard coded? #32