Open acoppock opened 10 months ago
Thank you, @acoppock, for noting this issue. This is due to my lack of sufficient experience in package development. I set defaults in some internal functions used within projoint()
. I believe I fixed this problem. See:
https://github.com/yhoriuchi/projoint/commit/5f0ad3718b06120deaab380ddca585eea54d4101
https://github.com/yhoriuchi/projoint/commit/1ac1818a582c557552d45be307dbb1191edf17d1
But I still see the same standard errors by running the following code:
library(projoint)
data("exampleData1")
outcomes <- paste0("choice", seq(from = 1, to = 8, by = 1))
outcomes <- c(outcomes, "choice1_repeated_flipped")
reshaped_data <- reshape_projoint(
.dataframe = exampleData1,
.outcomes = outcomes)
summary(projoint(reshaped_data))
summary(projoint(reshaped_data, .se_type_2 = "stata"))
If you added .se_type_2 = "CR2"
, then you would see an error message:
> summary(projoint(reshaped_data, .se_type_2 = "CR2"))
Error in check_se_type(se_type, clustered) :
`se_type` must be either 'HC0', 'HC1', 'stata', 'HC2', 'HC3', 'classical' or 'none' with no `clusters`.
You passed: CR2 which is reserved for a case with clusters.
This is because the default for .clusters_2
is NULL
. This error message also suggests that the hand-coded issue was resolved.
@acoppock , can you also try the following script?
library(projoint)
library(tidyverse)
library(estimatr)
data("exampleData1")
outcomes <- paste0("choice", seq(from = 1, to = 8, by = 1))
outcomes <- c(outcomes, "choice1_repeated_flipped")
reshaped_data <- reshape_projoint(
.dataframe = exampleData1,
.outcomes = outcomes)
cjdata <- reshaped_data@data %>%
select(id, task, profile, contains("att"), selected) %>%
pivot_longer(cols = contains("att"),
names_to = "attribute",
values_to = "attribute_level") %>%
mutate(attribute_level = as.character(attribute_level))
out1a <- projoint(reshaped_data, .remove_ties = FALSE) %>%
summary() %>%
filter(estimand == "mm_uncorrected") %>%
select("attribute_level" = att_level_choose,
"projoint_classical" = se)
out1b <- projoint(reshaped_data, .remove_ties = FALSE, .se_type_2 = "stata") %>%
summary() %>%
filter(estimand == "mm_uncorrected") %>%
select("attribute_level" = att_level_choose,
"projoint_stata" = se)
out2a <- cjdata %>%
group_by(attribute_level) %>%
reframe(tidy(lm_robust(selected ~ 1, se_type = "classical", data = pick(everything())))) %>%
select(attribute_level,
"lm_robust_classical" = std.error)
out2b <- cjdata %>%
group_by(attribute_level) %>%
reframe(tidy(lm_robust(selected ~ 1, se_type = "stata", data = pick(everything())))) %>%
select(attribute_level,
"lm_robust_stata" = std.error)
out <- out1a %>%
left_join(out1b, by = "attribute_level") %>%
left_join(out2a, by = "attribute_level") %>%
left_join(out2b, by = "attribute_level")
All of these methods produce the same standard errors.
Thank you for looking into this @yhoriuchi !
On the clusters issue, I had assumed you would be clustering standard errors at the respondent level ("id") -- I believe that's standard practice? I have heard the argument that we don't need to cluster because the random assignment is at the profile level. However, following the rule to "cluster at the level of sampling or assignment, whichever is higher" would suggest clustering at the respondent level.
All that to say, I would have thought the default for .clusters_2
would have been to pass the id
variable.
by the way, I'm not sure how to correctly pass the id variable to clusters_2
, none of the following work just yet.
summary(projoint(reshaped_data, .clusters_2 = id, .se_type_2 = "CR2"))
summary(projoint(reshaped_data, .clusters_2 = "id", .se_type_2 = "CR2"))
summary(projoint(reshaped_data, .clusters_2 = reshaped_data@data$id, .se_type_2 = "CR2"))
Thanks to the team for all the work on this project and package. In
projoint_level
, I think that the values of.se_type_1
and.se_type_2
aren't being passed topj_estimate
, which seems to hard code both values as "classical".as a result, the two calls below yield the same standard error estimates.