Data - Githubissues

wbinzhe commented 3 years ago

Neilson:

Homescan: unit of observation and category of purchases (grocery only?)

shoonlee commented 3 years ago

PRISM vs Daymet. How those grid data are collected (e.g., interpolation).

wbinzhe commented 3 years ago

PRISM vs Daymet. How those grid data are collected (e.g., interpolation).

@wbinzhe Thanks for the Nielsen application & approval. While we set up the computer, did you have a chance to explore how each grid data is constructed? Specifically,

How many weather stations have been used to produce the 1 by 1 grid temperature data? If it is indeed true that the data has been interpolated, can you explore how many of the grids will have missing value without interpolation? How 4 by 4 grid data is constructed (do we have unique observation at every 4 by 4 grid or is this interpolated as well)?

wbinzhe commented 3 years ago

PRISM vs Daymet. How those grid data are collected (e.g., interpolation).

@wbinzhe Thanks for the Nielsen application & approval. While we set up the computer, did you have a chance to explore how each grid data is constructed? Specifically,

How many weather stations have been used to produce the 1 by 1 grid temperature data? If it is indeed true that the data has been interpolated, can you explore how many of the grids will have missing value without interpolation? How 4 by 4 grid data is constructed (do we have unique observation at every 4 by 4 grid or is this interpolated as well)?

See p8 on Daymet Version 4 Dataset & PRISM Dataset, I am not processing them yet but the information is available from their authors. https://docs.google.com/document/d/1mevpwyHhSPoBbTQiSB6S0lFa3b8XI8UJQUjoaDCmC6A/edit?usp=sharing
Also, read this comment essay by NASA EarthData https://earthdata.nasa.gov/learn/articles/daymet-version4-data

shoonlee commented 3 years ago

@wbinzhe

Neilson:

Homescan: unit of observation and category of purchases (grocery only?)

When the Nielsen data is ready, please list up places (e.g., cities, counties, or zip codes, etc) that have enough observations. Siqi's proposal asked for three specific cities and I think the chances will be higher if we could pinpoint our RCA application to the geographic areas that Nielsen covers.

shoonlee commented 3 years ago

PRISM vs Daymet. How those grid data are collected (e.g., interpolation).

@wbinzhe Thanks for the Nielsen application & approval. While we set up the computer, did you have a chance to explore how each grid data is constructed? Specifically, How many weather stations have been used to produce the 1 by 1 grid temperature data? If it is indeed true that the data has been interpolated, can you explore how many of the grids will have missing value without interpolation? How 4 by 4 grid data is constructed (do we have unique observation at every 4 by 4 grid or is this interpolated as well)?

See p8 on Daymet Version 4 Dataset & PRISM Dataset, I am not processing them yet but the information is available from their authors. https://docs.google.com/document/d/1mevpwyHhSPoBbTQiSB6S0lFa3b8XI8UJQUjoaDCmC6A/edit?usp=sharing

Also, read this comment essay by NASA EarthData https://earthdata.nasa.gov/learn/articles/daymet-version4-data

@wbinzhe Ok, the Daymet data is interpolated. Before working on data cleaning with Daymet, can you first check if the same is true with 4 by 4 PRISM data?

wbinzhe commented 3 years ago

PRISM vs Daymet. How those grid data are collected (e.g., interpolation).

@wbinzhe Thanks for the Nielsen application & approval. While we set up the computer, did you have a chance to explore how each grid data is constructed? Specifically, How many weather stations have been used to produce the 1 by 1 grid temperature data? If it is indeed true that the data has been interpolated, can you explore how many of the grids will have missing value without interpolation? How 4 by 4 grid data is constructed (do we have unique observation at every 4 by 4 grid or is this interpolated as well)?

See p8 on Daymet Version 4 Dataset & PRISM Dataset, I am not processing them yet but the information is available from their authors. https://docs.google.com/document/d/1mevpwyHhSPoBbTQiSB6S0lFa3b8XI8UJQUjoaDCmC6A/edit?usp=sharing

Also, read this comment essay by NASA EarthData https://earthdata.nasa.gov/learn/articles/daymet-version4-data

@wbinzhe Ok, the Daymet data is interpolated. Before working on data cleaning with Daymet, can you first check if the same is true with 4 by 4 PRISM data?

Yes, the two dataset are developed by two different teams but serves the same purpose, producing continuous grids data and offering estimates where weather stations do not exist. https://climatedataguide.ucar.edu/climate-data/prism-high-resolution-spatial-climate-data-united-states-maxmin-temp-dewpoint

wbinzhe commented 3 years ago

For Daymet data, each variable each year takes 11GB, I think it exceeds the storage of my laptop. I'll start with getting familiar with processing nc4 data using one individual file, and then start the mass work when the lab computer set up.

wbinzhe commented 3 years ago

Grid size:

There are roughly 33,000 zip codes in the United State. The average land area of a zip code is around 233 sqkm.
The smallest zip code is 00906 which is only 0.008 sqkm. In contrast, the largest zip code is 99557 with a huge area of 34786 sqkm.
Much like the land area, there are vast differences in population size. For example, the most populated zip code is 00725 in Puerto Rico (over 144,000 residents), where as the smallest zip code is 59921 found in Lake Mc Donald, Montana with a population of just 1 resident.

wbinzhe commented 3 years ago

Practical Note on Nielson Access

Apply to Neilson center (both PI and students, prepare the research proposal ahead with basic elements such as research question, literature, contribution, hypothesis, etc.); approved in 1-2 days.
Register at Globus, request access to kilts-panel and kilts-scanner.
1. Note that the system is not very robust. My request was hold for several days; I emailed several times and then called the Help Desk 773.702.7414. It turned out they missed my request for some unknown reasons, and the staff approved my request in 5 minutes.
2. Scanner dataset are extremely large. You will get a separate username and pw from Kilts File Selection System (https://kiltsfiles.chicagobooth.edu/Requests/Create-New-Request.aspx), where customization is available and customized dataset will sent to your end point.
Install Globus Connect Personal at a local computer; where selected data are sync to your computer (i.e., personal endpoint). It works similar to Dropbox.

wbinzhe commented 3 years ago

Note on Data Limitations of Neilson Scanner

No shelving information. Each individual store reports weekly data for every UPC code that had any sales volume during the week. (Nielsen only observes Features and Displays in a subset of stores (called “audited stores”), and those stores can vary on a weekly basis. For where Features and Displays aren’t observed by Nielsen: If one store for a retailer has a feature in a given week, it is reasonable to assume that all other stores for that retailer in the same DMA have that same item featured. However, researchers should not assume that an item on Display would necessarily also be displayed at all stores in that retail chain within the same DMA because display execution can vary by store within chain/market?)
Retail chain names are not provided. If a researcher somehow determines a retailer’s name or identifies an individual store, the researcher agrees not to disclose retailer names or identify individual stores in any publications, working papers, presentations, or any other forums.
Geography: only to 3-digit zipcode (i.e. 929 3-digit ZIP Codes in the US).
10 Departments, ~125 Product Groups, ~1,100 Product Modules, then UPC code.
Covers 50% of drug stores, ~25% of food stores, ~25% of dollar stores, 20% of mass merchandising. Can not be used for regional market projection.

wbinzhe commented 3 years ago

Hi @shoonlee Just one very quick note: shall we restrict our sample period to post 2008? Noticing some very abrupt changes near financial crisis in both REITs and RCA data.

shoonlee commented 3 years ago

Hi @shoonlee Just one very quick note: shall we restrict our sample period to post 2008? Noticing some very abrupt changes near financial crisis in both REITs and RCA data.

@wbinzhe Probably we want to keep it for two reasons. First, we can show both results one as the main result and the other as a robustness check. Second, as long as those are orthogonal to the weather variation, it shouldn't, in principle, be too much of a problem.

wbinzhe / Climate_Retail

Data #4