Foot traffic regression

shoonlee commented 3 years ago

@wbinzhe

Let's define temperature in Fahrenheit and use the number of days above 90F instead of 30C, below 30F instead of 0C.
Try to run the following models.
Try both the entire sample (not restricting to those matched to the RCA based on the 3-digits) of stores within CA, TX, and NY, and 3-digits that are matched with the RCA data.

(1) y{iwt} = b*T{iwt} + a*P_{iwt} + \lambda_i + \gamma_t + \omegaw + \epsilon{iwt} where i is an individual store, w is a week of the year (so taking integer values between 1 and 52), t is a year. y is number of customers, T is temperature measure, P is precipitation measure.

Also, try a non-parametric damage function as the following:

(2) y{iwt} = b^k*T^k{iwt} + a*P_{iwt} + \lambda_i + \gamma_t + \omegaw + \epsilon{iwt}

where T^k_{itw} is the number of days in the daily average temperature bin k where k takes (in Fahrenheit) below 10, 10-20, 20-30, ..., over 90 for each week. For instance, suppose that in the week of Jan 5, 2019, the daily average was 50, 60, 65, 14, 10, 84, 12. Then T^{50-60}=1, T^{60-70}=2, T^{10-20}=3, and T^{80-90}=1.

shoonlee commented 3 years ago

@wbinzhe Can you try to produce tables for the foot traffic data by this Friday? It will be very helpful to talk to Justin at NUS.

shoonlee commented 3 years ago

@wbinzhe

I did something similar (for the non-parametric analysis) in my paper and the code below might be useful to create the temperature bin dummies (feel free to ignore them if you know how to do it already). In my case, I created flood size dummies using the continuous measure of flood size.

data <- tmp %>%
    mutate(f_cat = cut(fsize, breaks = c(1,2,10,20,30,40,50),   # fsize is continuous measure of flood size
                       labels = FALSE, include.lowest = TRUE), 
           vals = 1,
           f_cat = ifelse(f_cat %in% NA, 0, f_cat)) %>% 
    pivot_wider(names_from = f_cat, values_from = vals, 
                values_fill = list(vals = 0), names_prefix = "fl_")

shoonlee commented 3 years ago

@wbinzhe

Run two separate foot traffic regressions: (1) entire sample and (2) subset of stores located within the RCA zip code (namely, zip codes with RCA transaction observations).

With the RCA data, run the following:

(1) price{it} = bT{it} + aP_{it} + \lambda_i + \gammat + \epsilon{it} where i is an individual building, t is a year. T is temperature measure, P is precipitation measure. With the RCA data, in addition to the T measures used for the foot traffic data, try the average of the annual number of days above 90F in the past 5 years.

Also, try a non-parametric damage function: (2) price{it} = b^kT^k{it} + aP_{it} + \lambda_i + \gammat + \epsilon{it} Similar to (1), also try T^k_{it} with the average of the past 5 years.

shoonlee commented 3 years ago

@wbinzhe

For the foot traffic data, can you try aggregating the data at the month level and repeat the regression with the month fixed effect (instead of the week fixed effect) as well? It'd be helpful to distinguish overall decline vs. intertemporal substitution.

shoonlee commented 3 years ago

I did something similar (for the non-parametric analysis) in my paper and the code below might be useful to create the temperature bin dummies (feel free to ignore them if you know how to do it already). In my case, I created flood size dummies using the continuous measure of flood size.
data <- tmp %>%
    mutate(f_cat = cut(fsize, breaks = c(1,2,10,20,30,40,50),   # fsize is continuous measure of flood size
                       labels = FALSE, include.lowest = TRUE), 
           vals = 1,
           f_cat = ifelse(f_cat %in% NA, 0, f_cat)) %>% 
    pivot_wider(names_from = f_cat, values_from = vals, 
                values_fill = list(vals = 0), names_prefix = "fl_") 

shoonlee commented 3 years ago

@wbinzhe

Earlier code is to create the bins. To run the regression,

fl <- paste0("fl_", c(2:6)). # create dummies for each bin
initial <- paste(c(fl), collapse = "+")

reg_formula <- as.formula(paste("ln_dmg_cpi ~", initial, "| trans_year + comm_id | 0 | state"))
regpop <- felm(reg_formula, dmg). # run the regression

To collect terms and plot for multiple regression models,

get_foodplot <- function(result){

  tidy_dat <- tidy(get(result)) %>%
    filter(str_detect(term, ":")) %>%
    filter(str_detect(term, paste(c("eventyr3", "-4"),collapse = '|')) == F) %>%
    mutate(estimate = estimate,
           conf.low = estimate - 1.96*std.error,
           conf.high = estimate + 1.96*std.error)

  initial <- c("eventyr-1:ups_group", 0, 0, 0)
  tidy_dat <- rbind(tidy_dat, initial)

  desired_order <- c("eventyr-3:ups_group", "eventyr-2:ups_group", "eventyr-1:ups_group", "eventyr0:ups_group",
                     "eventyr1:ups_group", "eventyr2:ups_group")

  tidy_dat$term <- factor( as.character(tidy_dat$term), levels=desired_order )

  tidy_dat <- tidy_dat[order(tidy_dat$term),]

  tidy_dat <- tidy_dat %>%
    select(term, estimate, conf.low, conf.high) %>%
    mutate(term = c(-3:2),
           across(2:4, ~as.numeric(.)*100))

  p1 <- ggplot(data = tidy_dat, aes(x = term, y = estimate))

  p2 <- p1 +
    geom_point(shape = 16, size = 2) +
    geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                  width = 0.1, size = 0.5) +
    labs(x = "Years Since UPS", y = "(%) Per HH Grocery Purchase") +
    theme_bw() + theme(panel.grid.minor = element_blank(), legend.position="bottom", text = element_text(size=14))

  p3 <- p2 +
    geom_point(aes(color = "Estimate"), shape = 16, size = 2) +
    geom_errorbar(aes(ymin = conf.low, ymax = conf.high,
                      color = "95% CI"), width = 0.05, size = 0.5) +
    scale_color_manual(name = " ", values = c("cornflowerblue", "blue")) +
    guides(colour = guide_legend(reverse = TRUE,
                                 override.aes = list(linetype = c("blank", "solid"), shape = c(16, NA)))) +
    ylim(c(-28,10))

  print(p3)

}

results <- c("event_overall_kg_stack", "event_overall_exp_stack", "event_perish_kg_stack", "event_store_kg_stack")

map(.x = results, .f = get_foodplot)

shoonlee commented 3 years ago

@wbinzhe

For the Aug 12 meeting,

Can you run the regression following Andres's suggestion? Namely, including zcta3 by month FE as opposed to two separate FEs? For this, make sure to do it using the entire sample (not 10%). If needed, please run it on the server.

wbinzhe commented 3 years ago

@wbinzhe

For the Aug 12 meeting,

Can you run the regression following Andres's suggestion? Namely, including zcta3 by month FE as opposed to two separate FEs? For this, make sure to do it using the entire sample (not 10%). If needed, please run it on the server.

@shoonlee results are updated in slide #18, with full sample. Using state-month FE to preserve variable of interest temp_{county, month}. But I think state-month FE are sufficient to capture local holiday seasons and state-specific macro-economic shocks.

shoonlee commented 3 years ago

@wbinzhe

For the Aug 12 meeting,

Can you run the regression following Andres's suggestion? Namely, including zcta3 by month FE as opposed to two separate FEs? For this, make sure to do it using the entire sample (not 10%). If needed, please run it on the server.

@shoonlee results are updated in slide #18, with full sample. Using state-month FE to preserve variable of interest temp_{county, month}. But I think state-month FE are sufficient to capture local holiday seasons and state-specific macro-economic shocks.

Ah you’re right. Our temperature variable varies at the zcta3 by month level so we cannot include zcta3 by month FE. Can you update the earlier result including state by month FE? Thanks!

shoonlee commented 3 years ago

We need to explore the Safegraph data more thoroughly...

With different specifications (no brand and/or no subcategory FEs)
Restaurants
Retail (overall and within each sub-category)

wbinzhe / Climate_Retail

Foot traffic regression #9