wbinzhe / Climate_Retail

0 stars 1 forks source link

RCA Regressions #11

Open shoonlee opened 3 years ago

shoonlee commented 3 years ago

@wbinzhe

shoonlee commented 3 years ago

@wbinzhe

For the foot traffic regression:

shoonlee commented 3 years ago

@wbinzhe

For both RCA and foot traffic, create a graph similar to Figure 2 in the attached paper. Basically, temperature bins on the X-axis and the estimated effect size on the logged transaction price (RCA), and the logged number of visitors (Safeguard) are on the y-axis. You can find a sample code in the sample code thread (#12). The code does not have confidence interval, but adding them shouldn't be too hard.

Barreca et al. - 2016 - Adapting to Climate Change The Remarkable Decline.pdf

shoonlee commented 3 years ago

Comments on the Jul 22 updates

shoonlee commented 3 years ago

@wbinzhe

Post-meeting summary Jul 23: let me know if anything needs to be clarified.

wbinzhe commented 3 years ago

@wbinzhe

Post-meeting summary Jul 23: let me know if anything needs to be clarified.

  • Upload the code and cleaned data (Please don't wait until you finish everything. Upload multiple times as you progress)
  • Extend the RCA analysis to start the analysis in 2006
  • Create the map for 2002-2006 vs 2015-2019
  • Non-parametric estimation for RCA and foot traffic data

    • For RCA, create a temperature bin based on the 5-year average temperature and regress using it. For instance, if a 5-year average temperature is 75F, it will belong to 70-79 bin. Omit the 60-69 category and run the regression.
    • For the foot traffic, repeat the same but at the monthly level. So calculate an average of daily mean temperature for each month and create a bin using it. Then run the regression.

@shoonlee I don't understand "Omit the 60-69 category and run the regression.", this is the code i would use for a specific bin. Anything wrong here? lm_bin_i <- felm(formula = hedonic_1_5year, data = rca_retail %>% filter((annual_temperature) == bins[i]))

shoonlee commented 3 years ago

@wbinzhe

No, that's not what I mean. It's something like this:

felm(ln_price = t30 + t40 + t50 + t70 + t80 + t90 | xxxx | xxxx | data = data) where t30 == 1 when average temperature is between 30 and 40. For foot traffic, average temperature means an average of daily mean temperature for a given month, and for RCA it will be an average of 5-years. Does that make sense?

shoonlee commented 3 years ago

@wbinzhe

Take a look at the paper I attached to this thread (Barreca et al. - 2016 - Adapting to Climate Change The Remarkable Decline.pdf). It's a very similar specification.

wbinzhe commented 3 years ago

@shoonlee Sure got it!

wbinzhe commented 3 years ago

@shoonlee Please find Slide #31 and # 32 for RCA non-parametric estimations. https://docs.google.com/presentation/d/14_aDxt2O_Le4mCJj4lBfuK-rG9gI6WA8U69lhPJajis/edit?usp=sharing. For annual/5-year mean temp, I adjusted the reference level to 52F instead of 60F, by learning from the data patterns. For 5-year or annual average temp, the range is 40-80F.
For annual/5-year max temp, the 5-year results is very noisy compared with all other plots.

Also, for parametric regression using # of days t>90F, adding the observations from 2006-2009 makes the negative effect non-significant (i.e., sample period 2006-2019). But excluding observations 2006-2008 will recover the negative effects. I did not find any sound heterogeneity over earlier years vs. later years, but it looks like to be a sample size issue (only three states).

If you want to look into these problems this weekend, codes are in RCA_retail_ca_tx_ny.R.

shoonlee commented 3 years ago

@wbinzhe

Thanks for the update. It sort of makes sense that we have a temperature range of 40-80 only for yearly data. As opposed to the monthly data where we have summer and winter months as separate observations, in yearly data things will be averaged out so we wouldn't really have observations above 90F or below 40F.

Can we go back to the number of days above a certain temperature as the definition of the temperature bin here? So basically run something like

felm(ln_price = t20 + t30 + t40 + t50 + t70 + t80 + t90 | xxxx | xxxx | data = data)

but t30 here is defined as the number of days where the average daily temperature is between 30-40F? The interpretation might change slightly from the foot traffic data, but we should try this.

Also, can we try this definition with the foot traffic data as well?

In summary, let's repeat the analysis with a different temperature bin definition (the number of days with daily mean temperature in a certain temperature bin).

wbinzhe commented 3 years ago

@shoonlee Let me know if I am understanding this correctly: In the RCA case, when we use the annual mean temp, for each observation, we only assign it to 1 specific bin, all others are zero. If we change bins to the # of days when average daily temperature is between some F-range, then we are assigning 365 days to each bin (so we still need to drop one bin 60F). And the intepretation is that in locations with 1 more day in a specific temp-range, property value is xx higher/lower.

shoonlee commented 3 years ago

@shoonlee Let me know if I am understanding this correctly: In the RCA case, when we use the annual mean temp, for each observation, we only assign it to 1 specific bin, all others are zero. If we change bins to the # of days when average daily temperature is between some F-range, then we are assigning 365 days to each bin (so we still need to drop one bin 60F). And the intepretation is that in locations with 1 more day in a specific temp-range, property value is xx higher/lower.

@wbinzhe

I think your understanding is correct. Suppose that in 2015, we had 24 days with a daily mean temperature over 90F then for that year t90==24. Also, t20+t30+t40+...+t90 = 365.

Barreca et al (2016) - the paper we've repeatedly talked about - defined the variable in this way. See figure 1 and their econometric model section to make the variable definition more clear.

wbinzhe commented 3 years ago

@shoonlee Let me know if I am understanding this correctly: In the RCA case, when we use the annual mean temp, for each observation, we only assign it to 1 specific bin, all others are zero. If we change bins to the # of days when average daily temperature is between some F-range, then we are assigning 365 days to each bin (so we still need to drop one bin 60F). And the intepretation is that in locations with 1 more day in a specific temp-range, property value is xx higher/lower.

@wbinzhe

I think your understanding is correct. Suppose that in 2015, we had 24 days with a daily mean temperature over 90F then for that year t90==24. Also, t20+t30+t40+...+t90 = 365.

Barreca et al (2016) - the paper we've repeatedly talked about - defined the variable in this way. See figure 1 and their econometric model section to make the variable definition more clear.

@shoonlee Thanks and I will double-check the intepretation in the paper!

wbinzhe commented 3 years ago

@wbinzhe

Post-meeting summary Jul 23: let me know if anything needs to be clarified.

  • Upload the code and cleaned data (Please don't wait until you finish everything. Upload multiple times as you progress)
  • Extend the RCA analysis to start the analysis in 2006
  • Create the map for 2002-2006 vs 2015-2019
  • Non-parametric estimation for RCA and foot traffic data

    • For RCA, create a temperature bin based on the 5-year average temperature and regress using it. For instance, if a 5-year average temperature is 75F, it will belong to 70-79 bin. Omit the 60-69 category and run the regression.
    • For the foot traffic, repeat the same but at the monthly level. So calculate an average of daily mean temperature for each month and create a bin using it. Then run the regression.

@shoonlee The maps are under replication_folder/maps, I created multiple versions because maps of # of days above 90Fs are very similar. And if not necessary, do not run "prism_daily_assemble.R" to reproduce the maps, each map took 1 ~ 2h to be plotted out. Also pasted these maps in shared google slides for you to take a quick look.

shoonlee commented 3 years ago

Binzhe,

Great! I think I’m going to use the difference version. Also, can we push back the time a bit for tmr’s meeting? Anytime after 5 pm should work for me. Thanks!

Sent from my iPhone

On Jul 25, 2021, at 8:48 PM, wbinzhe @.***> wrote:

 @wbinzhe

Post-meeting summary Jul 23: let me know if anything needs to be clarified.

Upload the code and cleaned data (Please don't wait until you finish everything. Upload multiple times as you progress)

Extend the RCA analysis to start the analysis in 2006

Create the map for 2002-2006 vs 2015-2019

Non-parametric estimation for RCA and foot traffic data

For RCA, create a temperature bin based on the 5-year average temperature and regress using it. For instance, if a 5-year average temperature is 75F, it will belong to 70-79 bin. Omit the 60-69 category and run the regression. For the foot traffic, repeat the same but at the monthly level. So calculate an average of daily mean temperature for each month and create a bin using it. Then run the regression. @shoonlee The maps are under replication_folder/maps, I created multiple versions because maps of # of days above 90Fs are very similar. And if not necessary, do not run "prism_daily_assemble.R" to reproduce the maps, each map took 1 ~ 2h to be plotted out.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

shoonlee commented 3 years ago

@wbinzhe

wbinzhe commented 3 years ago

@wbinzhe

  • Histogram of the number of days variable (like figure 1 in Barreca et al 2016)
  • RCA entire sample analysis with the number of days in temperature bin variables
  • RCA centers vs shops
  • RCA current year temperature
  • Foot traffic (entire sample) with the number of days temperature definition
  • Foot traffic for (roughly) centers vs shopts

@shoonlee Hi Seunghoon, I am going to move to Safegraph. Please double check that we have everything needed for RCA analysis.

shoonlee commented 3 years ago

Hi Binzhe,

It looks good except for one thing: the x-axis. Did you indeed use a bin size of 2 or are they just typo?

On Tue, Jul 27, 2021 at 7:46 PM wbinzhe @.***> wrote:

@wbinzhe https://github.com/wbinzhe

  • Histogram of the number of days variable (like figure 1 in Barreca et al 2016)
  • RCA entire sample analysis with the number of days in temperature bin variables
  • RCA centers vs shops
  • RCA current year temperature
  • Foot traffic (entire sample) with the number of days temperature definition
  • Foot traffic for (roughly) centers vs shopts

@shoonlee https://github.com/shoonlee Hi Seunghoon, I am going to move to Safegraph. Please double check that we have everything needed for RCA analysis.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wbinzhe/Climate_Retail/issues/11#issuecomment-887926699, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMM5CBGRBXOSHG65ZLIKASLTZ5HNFANCNFSM5ARHFCQA .

shoonlee commented 3 years ago

Did you use the number of days or average temperature for shops vs centers?

On Tue, Jul 27, 2021 at 7:53 PM Seunghoon Lee @.***> wrote:

Hi Binzhe,

It looks good except for one thing: the x-axis. Did you indeed use a bin size of 2 or are they just typo?

On Tue, Jul 27, 2021 at 7:46 PM wbinzhe @.***> wrote:

@wbinzhe https://github.com/wbinzhe

  • Histogram of the number of days variable (like figure 1 in Barreca et al 2016)
  • RCA entire sample analysis with the number of days in temperature bin variables
  • RCA centers vs shops
  • RCA current year temperature
  • Foot traffic (entire sample) with the number of days temperature definition
  • Foot traffic for (roughly) centers vs shopts

@shoonlee https://github.com/shoonlee Hi Seunghoon, I am going to move to Safegraph. Please double check that we have everything needed for RCA analysis.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wbinzhe/Climate_Retail/issues/11#issuecomment-887926699, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMM5CBGRBXOSHG65ZLIKASLTZ5HNFANCNFSM5ARHFCQA .

wbinzhe commented 3 years ago

@shoonlee I used size 2 for each temp bin. For annual average, we only have 40-80, i also did trials for 5-degree bins, but 2-degree bins yields cleaner shape.

shoonlee commented 3 years ago

Try the number of days for shops vs centers please.

On Jul 27, 2021, at 8:03 PM, wbinzhe @.***> wrote:

 @shoonlee I used size 2 for each temp bin. For annual average, we only have 40-80, i also did trials for 5-degree bins, but 2-degree bins yields cleaner shape.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

wbinzhe commented 3 years ago

@shoonlee Monthly foot traffic part also updated

wbinzhe commented 3 years ago

@shoonlee slides #18 : non-parametric est for different store types using # of days in each bin. slides #17 and #18 used all observations (store*month) rather than 10%.