This github repository contains all of the data, code, and analysis from our capstone project in partnership with MTV. In this study we looked at voting access on college campuses in 2012, 2016, 2018, and 2020. We used distance from the college campus to the nearest polling place as a proxy for access. The source data and code files can be used to recreate our intermediate files and consequently run our analysis files. All code files should be run in numerical order within a folder. Our final presentation slides and report can be found in the 40_docs folder.
Our code and analysis was doneusing Geopandas, and a link to install it can be found below:
https://www.practicaldatascience.org/html/gis_setup_geopandas.html
Our long-term planning can be found in this Gantt Chart: https://docs.google.com/spreadsheets/d/1HWH5hh6ox8acLfQ1_6jpGFs9mc5kYe1hEaEqWNq4oJg/edit?usp=sharing
Our short term task management is shown via the Github Issues tab of this repository.
In this folder we have our original data files. Our polling data for 2012, 2016, 2018, and 2020 comes from the Center for Public Integrity. It was added to this folder after it was geocoded with geocodio. Our 2020 & 2018 early voting data from Ballot Ready is also in this folder. This folder also holds our original college campus, college demographic, and college enrollment data.
In this folder we have code files that help us prepare our source data, merge data sources together, and conduct meaningful analysis. We will explain some of the most important files in our code folder.
This sub-folder contains all of the code used to clean and prepare college campus data and polling place data. There are files that clean polling data from each year, early voting data, and our college polygons. Below the college polygon file details are given as an example:
File Name: 10_import_HIFLD_College_Polygons.py
Description: Loading and Cleaning HIFLD data.
Input: 00_source_data/HIFLD_CollegeUniversityCampuses/HIFLD_CollegeUniversityCampuses.shp
Output: /20_intermediate_files/10_HIFLD_campus_polygons.csv
This sub-folder contains all of the code used to merge college campus data, college demographic data, and polling place data. There are files that merge college campuses with college demographic information, calculate the distance to the nearest polling place for each college, and merge college enrollment data with the nearest polling place information. Below the college demographic and enrollment file details are given as an example:
File Name: 10_merging_HIFLD_w_demographics.py
Description: Merging the HIFLD and demographics data through inner join and manual matching
Input: 20_intermediate_files/10_HIFLD_campus_polygons.geojson and 00_source_data/PlusOne_College_Demographic_Data/SLSV Master Campus Sheet - Master Sheet.csv
Output: 20_intermediate_files/20_campus_polygons_w_demographics.geojson"
File Name: college_enrollment_data.py
Description: Merging the college name and the enrollment data (HD2020 and EF2020B)
Input:/00_source_data/college_enrollment_data/college_name.csv and 00_source_data/college_enrollment_data/enrollment_data.csv
Output: /20_intermediate_files/college_enrollment_data_merge.csv
This sub-folder contains all of our analysis notebooks and code used to add relevant analysis details (like region or urban/rural census designation). There are files that connect our data with the google distance API to obtain travel times, add regions, and add an urban/rural census designation. Our analysis files detail our temporal analysis, our urban/rural analysis, our dropbox analysis, and other analysis areas. Below the distance API, regions, and 2012-2018 results file details are given as an example:
File Name: 112_distance_api.py
Description: Calculates the Driving, Walking, Transit time and duration from the college and its nearest polling location through the Google Maps API
Input: 30_campuses_w_dist_to_nearest_pp.geojson
Output: /20_intermediate_files/30_campuses_w_dist_to_nearest_pp.geojson
File Name: 120_regions.py
Description: Appending Region to each state
Input: /20_intermediate_files/30_campuses_w_dist_to_nearest_pp.csv
Output: /20_intermediate_files/30_campuses_w_dist_to_nearest_pp.csv
File Name: 2012_2016_2018_results.ipynb
Description: Yearly results using the Google API
In this folder we have any files that were output from our 10_code folder files. We have our cleaned polling place files and files from our campus merges. One of our most important intermediate files is '30_campuses_w_dist_to_nearest_pp.geojson' because it contains all campuses with their nearest polling place and is the basis for much of our analysis.
In this folder we have our important documents from the course of the project. Our backwards design and team charter are documents from the beginning of our project. Our major semester milestone and end of semester report were intermediate deliverables to our capstone client. And finally we have our final report and symposium slides as our final capstone deliverables.