yixinsun1216 / covertoperations_manual

1 stars 0 forks source link

Download, Organize and Document the RECS data #6

Open tcovert opened 6 years ago

tcovert commented 6 years ago

Hi Rohan,

We'd like for you to finish up the RECS "microdata" data collection process. This entails three things:

  1. Download and read in the micro data files for the RECS surveys of 1993, 1997, 2001, 2005 and 2009. We'll do the 2015 data once its finalized.
    • "Download" means save the raw data to the box folder with a consistent naming scheme.
    • "Read in" means to write an R program (probably just one) that reads these files into an R data frame and ensures that each of the columns has the proper type (date vs. string vs. integer vs. floating point number, etc).
    • This program should be "platform independent", in the sense that if I run the code on my computer, it should work just as well as it does on your computer. Crucially, this means that none of the file path references should be absolute (i.e., they shouldn't refer to your hard drive or username directly).
  2. Figure out what survey responses we actually want to focus on in a first pass. Obviously, this includes basic identifying information but it will also include survey data on the characteristics of the household, information on the household's capital stock, and records of the household's energy consumption from the utility bills. The goal here is to make sure we don't have to keep track of hundreds of variables in each of the surveys.
  3. Once we have identified the survey variables to keep, we need to figure out if they are constant across surveys. So:
    • figure this out. are the same capital stock and consumption variables recorded in every survey?
    • presumably the answer is "it depends", so make a table that documents which variables are recorded in which surveys
  4. Among the variables that appear in more than 1 survey, come up with a set of consistent variables so that we can find, say, "gas consumption" in all surveys using the same variable name
  5. Make the table we discussed: for each survey, tabulate the frequency of missing values, imputed values and real data across each of the variables that we care about.