mshenfield / chtl-data-pipeline

A pipeline for data from the Capitol Hill Tool Library's MyTurn dashboard.
2 stars 0 forks source link

Capitol Hill Tool Library Data Pipeline

Some scripts that download MyTurn data, massage it, and display them as reports.

Running

Make sure you have Pipenv installed. In this directory, run:

pipenv shell
# Installs the myturn_bot package, and juptyer lab+pandas for hacking.
pipenv install

myturn_bot is installed as an editable package, so you can make edits and it will be reflected when you run it next.

Pipeline

To fetch and process data, use the newly installed myturn_bot pipeline tool in your pipenv.

# Download and process all your library's data
myturn_bot pipeline --output <somedir> --subdomain capitolhill

WARNING: Running myturn_bot pipeline requires Super Admin access.

See myturn_bot pipeline --help for more info on options. In addition to a full run, you can only run specific stages (download/process), files (users/loans), or years (e.g only 2024 loans and transactions).

Notebooks

See the notebooks/ dir to run some interesting reports and data based on the processed MyTurn data. To run notebooks:

pipenv shell
jupyter lab

Data Sources

The process for manually exporting data from capitolhill.myturn.com is documented here. This is kept up to date because it makes it clear where data is coming from, and is used to grab the URL used by the programattic download script.

When done, run ./lib/cli.py anonymize --input_directory data/input_with_personal_info/ --output_directory data/input/ to remove any personal info from the downloads and copy it to the input folder, where it can be committed to source control and used by scripts.

Deployment

Copy the setup.py and the myturn_bot folder to a folder on the server you'd liek to run:

cd <project-folder>
# At least Python 3.8. Create a virtualenv in the "venv/" folder
python3 -m venv venv
. ./venv/bin/activate
pip install -e .
# Now you can run the command line

License

MIT