open-innovations / leeds-2023

Data repo for Leeds 2023
https://data.leeds2023.co.uk
MIT License
3 stars 1 forks source link
culture leeds2023 microsite

leeds-2023

Data processing and microsite for Leeds 2023

data-puller publish-on-github-pages

Scripts

NEW NEW NEW: This repo contains a velociraptor.yaml file to capture scripts. Take a look at https://velociraptor.run/ More documentation to come...

The repo contains a series of pipelines which are used to collect and process data.

If you are running the python scripts, you will need to install the dependencies listed in requirements.txt. You will also need to set PYTHONPATH in your environment to include scripts. On a mac, this can be acheived with the following command: export PYTHONPATH=scripts. Without that, the scripts will not run, and will throw an error similar to this:

ModuleNotFoundError: No module named 'metrics'

Pipelines

Some of the scripts and data are managed in a DVC pipeline. DVC has been added to the requirements.txt file, so ensure that your python environment has the required dependencies installed. This could be as simple as running pip3 install -r requirements.txt. It's recommended to use a virtual environment tool such as virtualenv to avoid clashing requirements.

The repo uses data held in AWS S3 buckets. To access this, make sure AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set for your environment.

Here are some useful DVC commands:

Known issues