Open Asquidy opened 4 years ago
Hi @Asquidy, can you post a reproducible example that illustrates the issue?
Sure thing! Here it is:
library(data.table) library(ggplot2) library(eventStudy) num_periods <- 10 treatment_time <- 5
sim_data <- ES_simulate_data(units = 1000, min_cal_time = 1, max_cal_time = num_periods)[["observed"]] sim_data[individual %% 2 == 0, treatment_year := treatment_time] sim_data[individual %% 2 == 1, treatment_year := NA] sim_data[, ever_treated := as.numeric(treatment_year == 5 & !is.na(treatment_year))] sim_data[, is_treated := ifelse(!is.na(treatment_year), as.numeric(year >= treatment_year), 0)] mean_outcome <- mean(sim_data$outcome)
unit_trend_avg <- -.1*mean_outcome error_sd_trend <- .5
sim_data[, unit_trend := unit_trend_avg + rnorm(mean = 0, sd = abs(error_sd_trend*unit_trend_avg), n = .N)]
sim_data[, outcome_plus_trend := unit_trend year ever_treated + outcome]
sim_data2 <- copy(sim_data) results <- ES(long_data=sim_data2, outcomevar="outcome_plus_trend", unit_var="individual", cal_time_var="year", onset_time_var="treatment_year", cluster_vars="individual", never_treat_action="only", linearize_pretrends = TRUE, homogeneous_ATT = TRUE, omitted_event_time = -1)
all.equal(sim_data2, sim_data) " "Column 'treatment_year': 'is.NA' value mismatch: 5000 in current 0 in target""
Hi @Asquidy, you are correct. This has to do with how ES deals with the cases with no onset_time_var
-- it essentially assumes that they are treated, but beyond the extent of the calendar times + anticipation period in the panel, and so assigns them a number.
Short answer, if you are running things in a sequence using the same data set (say, ES with several specifications, etc.) and are not memory restricted, I would advise explicitly copying the data within ES, e.g.:
ES(long_data=copy(sim_data2), outcomevar="outcome_plus_trend", unit_var="individual", cal_time_var="year", onset_time_var="treatment_year", cluster_vars="individual", never_treat_action="only", linearize_pretrends = TRUE, homogeneous_ATT = TRUE, omitted_event_time = -1)
Thanks a lot for sharing this code! I did indeed copy the data structure. I'd suggest avoiding side-effects in the package if you continue working on it. It was an unexpected behavior for me and would be for other users as well.
When calling the ES function, the underlying data gets modified. Specifically, the calendar time variable with NAs get remapped.
results <- ES(long_data=sim_data2, outcomevar="outcome_plus_dynamic", unit_var="individual", cal_time_var="year", onset_time_var="treatment_year", cluster_vars="individual", never_treat_action="only", linearize_pretrends = TRUE, homogeneous_ATT = TRUE, omitted_event_time = -1)
changes the 'year' column in sim_data2