pip install -r requirements.txt
and pip install -e .
if not in Docker (not recommended for development)The main components of the package are broken up into subpackages which can be imported and used in external code. To run pipelines directly you can use the scripts in the scripts
directory. These scripts have been dockerized already and can be run simply using make
commands.
make run-transform-pipeline
: This runs the pipeline to transform raw data from each state into the appropriate schema.
data/raw
folder. Follow setup instructions to get data. make run-clean-classify-graph-pipeline
: This runs the pipeline to clean, classify, and graph data that is already in the correct schema.
inds_mini.csv
, orgs_mini.csv
, and trans_mini.csv
in a data/transformed
directory (should be in git by default) For developing, please use either a Docker dev container or slurm computer cluster. See more details in CONTRIBUTING.md
Project python code
Contains short, clean notebooks to demonstrate analysis.
Contains details of acquiring all raw data used in repository. If data is small (<50MB) then it is okay to save it to the repo, making sure to clearly document how to the data is obtained.
If the data is larger than 50MB than you should not add it to the repo and instead document how to get the data in the README.md file in the data directory.
This README.md file should be kept up to date.
This folder is empty by default. The final outputs of make commands will be placed here by default.
Student Name: Nicolas Posner Student Email: nrposner@uchicago.edu
Student Name: Alan Kagiri Student Email: alankagiri@uchicago.edu.
Student Name: Adil Kassim Student Email: adilk@uchicago.edu
Student Name: Nayna Pashilkar Student Email: npashilkar@uchicago.edu
Student Name: Yangge Xu Student Email: yanggexu@uchicago.edu
Student Name: Bhavya Pandey
Student Email: bhavyapandey@uchicago.edu
Student Name: Kaya Lee Student Email: klee2024@uchicago.edu