hadley / data-housing-crisis

Clean data related to the housing crisis
53 stars 12 forks source link

Project Overview

The US housing crisis has undermined the world economy in far reaching and poorly understood ways. Although there is a lot of speculation over the causes and the effects of the housing crisis, most hypotheses are not backed up by data. We hope to promote well-informed policy and discussion, and aid exploration and analysis, by making creating an accessible and reproducible repository of data and analysis

Data related to the housing crisis exists in large (up to 10 gb), independent, and often messy data sets. The variety and inconsistency of data creates an obstacle for analysis, and this summer we have working to provide views of this data that are consistent, concise and complete. To ensure that all manipulation is transparent, both data cleaning and analysis have been carried out with the source statistical software R. Both code and data are freely licensed and made available on github. To date we have cleaned and organised 13 data sets related to the housing crisis, and by keeping the code transparent and reproducible, we hope to inspire others to contribute their data and ideas.

This research project is a collaboration between Rice University undergrads, graduate students, and Hadley Wickham, an Assistant Professor of Statistics. It is funded by the NSF's Vertically Integrated Grants for Research and Education in Mathematical Sciences (VIGRE) program, NSF grant DMS-0739420.

Data Set Overview

Terms

Locations

Future plans

We would like to develop a website that will allow users to easily access the data they are interested in, which would otherwise be a daunting task for those who wish to use a data set of this size. Because our analysis and findings also involve large amounts of information, (such as construction price time series for each US metropolitan area) we are exploring interactive graphical methods for displaying this information.