Jupyter Notebook for scraping the JS files from urls listed in the data set along with urls listed from the Princeton survey. Some potential optimizations with the code are listed within the file.
Note: at the moment only works with a sample from the dataset that can be held in memory by pandas. Will require reworking to use spark in order to handle the entire dataset.
Jupyter Notebook for scraping the JS files from urls listed in the data set along with urls listed from the Princeton survey. Some potential optimizations with the code are listed within the file.
Note: at the moment only works with a sample from the dataset that can be held in memory by pandas. Will require reworking to use spark in order to handle the entire dataset.