Open shoonlee opened 3 years ago
@wbinzhe Can you try to produce tables for the foot traffic data by this Friday? It will be very helpful to talk to Justin at NUS.
@wbinzhe
I did something similar (for the non-parametric analysis) in my paper and the code below might be useful to create the temperature bin dummies (feel free to ignore them if you know how to do it already). In my case, I created flood size dummies using the continuous measure of flood size.
data <- tmp %>%
mutate(f_cat = cut(fsize, breaks = c(1,2,10,20,30,40,50), # fsize is continuous measure of flood size
labels = FALSE, include.lowest = TRUE),
vals = 1,
f_cat = ifelse(f_cat %in% NA, 0, f_cat)) %>%
pivot_wider(names_from = f_cat, values_from = vals,
values_fill = list(vals = 0), names_prefix = "fl_")
@wbinzhe
Run two separate foot traffic regressions: (1) entire sample and (2) subset of stores located within the RCA zip code (namely, zip codes with RCA transaction observations).
With the RCA data, run the following:
(1) price{it} = bT{it} + aP_{it} + \lambda_i + \gammat + \epsilon{it} where i is an individual building, t is a year. T is temperature measure, P is precipitation measure. With the RCA data, in addition to the T measures used for the foot traffic data, try the average of the annual number of days above 90F in the past 5 years.
Also, try a non-parametric damage function: (2) price{it} = b^kT^k{it} + aP_{it} + \lambda_i + \gammat + \epsilon{it} Similar to (1), also try T^k_{it} with the average of the past 5 years.
@wbinzhe
For the foot traffic data, can you try aggregating the data at the month level and repeat the regression with the month fixed effect (instead of the week fixed effect) as well? It'd be helpful to distinguish overall decline vs. intertemporal substitution.
I did something similar (for the non-parametric analysis) in my paper and the code below might be useful to create the temperature bin dummies (feel free to ignore them if you know how to do it already). In my case, I created flood size dummies using the continuous measure of flood size.
data <- tmp %>% mutate(f_cat = cut(fsize, breaks = c(1,2,10,20,30,40,50), # fsize is continuous measure of flood size labels = FALSE, include.lowest = TRUE), vals = 1, f_cat = ifelse(f_cat %in% NA, 0, f_cat)) %>% pivot_wider(names_from = f_cat, values_from = vals, values_fill = list(vals = 0), names_prefix = "fl_")
@wbinzhe
Earlier code is to create the bins. To run the regression,
fl <- paste0("fl_", c(2:6)). # create dummies for each bin
initial <- paste(c(fl), collapse = "+")
reg_formula <- as.formula(paste("ln_dmg_cpi ~", initial, "| trans_year + comm_id | 0 | state"))
regpop <- felm(reg_formula, dmg). # run the regression
To collect terms and plot for multiple regression models,
get_foodplot <- function(result){
tidy_dat <- tidy(get(result)) %>%
filter(str_detect(term, ":")) %>%
filter(str_detect(term, paste(c("eventyr3", "-4"),collapse = '|')) == F) %>%
mutate(estimate = estimate,
conf.low = estimate - 1.96*std.error,
conf.high = estimate + 1.96*std.error)
initial <- c("eventyr-1:ups_group", 0, 0, 0)
tidy_dat <- rbind(tidy_dat, initial)
desired_order <- c("eventyr-3:ups_group", "eventyr-2:ups_group", "eventyr-1:ups_group", "eventyr0:ups_group",
"eventyr1:ups_group", "eventyr2:ups_group")
tidy_dat$term <- factor( as.character(tidy_dat$term), levels=desired_order )
tidy_dat <- tidy_dat[order(tidy_dat$term),]
tidy_dat <- tidy_dat %>%
select(term, estimate, conf.low, conf.high) %>%
mutate(term = c(-3:2),
across(2:4, ~as.numeric(.)*100))
p1 <- ggplot(data = tidy_dat, aes(x = term, y = estimate))
p2 <- p1 +
geom_point(shape = 16, size = 2) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
width = 0.1, size = 0.5) +
labs(x = "Years Since UPS", y = "(%) Per HH Grocery Purchase") +
theme_bw() + theme(panel.grid.minor = element_blank(), legend.position="bottom", text = element_text(size=14))
p3 <- p2 +
geom_point(aes(color = "Estimate"), shape = 16, size = 2) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high,
color = "95% CI"), width = 0.05, size = 0.5) +
scale_color_manual(name = " ", values = c("cornflowerblue", "blue")) +
guides(colour = guide_legend(reverse = TRUE,
override.aes = list(linetype = c("blank", "solid"), shape = c(16, NA)))) +
ylim(c(-28,10))
print(p3)
}
results <- c("event_overall_kg_stack", "event_overall_exp_stack", "event_perish_kg_stack", "event_store_kg_stack")
map(.x = results, .f = get_foodplot)
@wbinzhe
For the Aug 12 meeting,
@wbinzhe
For the Aug 12 meeting,
- Can you run the regression following Andres's suggestion? Namely, including zcta3 by month FE as opposed to two separate FEs? For this, make sure to do it using the entire sample (not 10%). If needed, please run it on the server.
@shoonlee results are updated in slide #18, with full sample. Using state-month FE to preserve variable of interest temp_{county, month}. But I think state-month FE are sufficient to capture local holiday seasons and state-specific macro-economic shocks.
@wbinzhe
For the Aug 12 meeting,
- Can you run the regression following Andres's suggestion? Namely, including zcta3 by month FE as opposed to two separate FEs? For this, make sure to do it using the entire sample (not 10%). If needed, please run it on the server.
@shoonlee results are updated in slide #18, with full sample. Using state-month FE to preserve variable of interest temp_{county, month}. But I think state-month FE are sufficient to capture local holiday seasons and state-specific macro-economic shocks.
Ah you’re right. Our temperature variable varies at the zcta3 by month level so we cannot include zcta3 by month FE. Can you update the earlier result including state by month FE? Thanks!
We need to explore the Safegraph data more thoroughly...
@wbinzhe
(1) y{iwt} = b*T{iwt} + a*P_{iwt} + \lambda_i + \gamma_t + \omegaw + \epsilon{iwt} where i is an individual store, w is a week of the year (so taking integer values between 1 and 52), t is a year. y is number of customers, T is temperature measure, P is precipitation measure.
Also, try a non-parametric damage function as the following:
(2) y{iwt} = b^k*T^k{iwt} + a*P_{iwt} + \lambda_i + \gamma_t + \omegaw + \epsilon{iwt}
where T^k_{itw} is the number of days in the daily average temperature bin k where k takes (in Fahrenheit) below 10, 10-20, 20-30, ..., over 90 for each week. For instance, suppose that in the week of Jan 5, 2019, the daily average was 50, 60, 65, 14, 10, 84, 12. Then T^{50-60}=1, T^{60-70}=2, T^{10-20}=3, and T^{80-90}=1.