ECMA-31330-Project
Econometrics and Machine Learning Group Project
Principal Component Regression as a Solution to Measurement Error Bias
Isaac Liu, Nicolas Martorell \& Paul Opheim
The paper can be found here.
Contents
Here is the structure of this repository, along with links to relevant folders.
Contains all data files which are inputs to analysis for the application and simulations.
Contains tables, figures, and simulation results.
Contains the abstract and paper pdfs and associated LaTeX files and bibliography material.
Contains the project source code.
Contains code for the life expectancy and government health share application.
Contains code to run the Monte Carlo simulations and produce relevant tables.
Replication Instructions
- Download or clone the repository. There should be no need to modify directory structure, but the relevant python packages must be installed.
- The simulation files are structured such that they may be run on a computing cluster. These steps each have their own .sh scripts and Parallel_Simulations.sh is set up to run them all sequentially while requesting the appropriate amount of computing resources.
- Setup_Parallel_Sims.py defines a csv of parameter values and combinations.
- Run_Parallel_Sim.py is executed in parallel and performs 1,000 simulations for each parameter combination considered and outputs the results.
- Compile_Parallel_Sims.py combines the results from the previous step into a single file.
- To produce the statistics for tables based off of the simulation results, run Produce_Tables_Parallel.ipynb.
- To download the World Bank Data used for the application, run Get_WB_Data.py. This is definitely an optional step because the World Bank may update the data and the results may change slightly.
- To run the primary empirical analysis for the application, run Application_Gov_Health_Spending_Share_LE.py.