The automated steps in this repo are roughly as follows:
/core-project/abc123
) to PDF reports./app
- Dashboard webapp made with Vue.
Also used for generating PDF reports.
/public/pdfs
- Outputted PDF reports./data
- All other functionality involving data (e.g. ingesting/collating/etc).
/api
- Types and functions for getting raw data from external APIs./raw
- Raw data gathered from external sources.
Primarily for provenance, but also acts as ingest cache (delete files to re-fetch from external providers)./ingest
- Functions for scraping webpages and calling APIs, and collating that data into a common format./output
- Collated data in format for making desired reports./print
- Functions specific to making printed reports./util
- Small-scope general purpose functions.The ingest pipeline is optimized wherever possible and appropriate. Things like network requests and rendering are parallelized (e.g. PDF reports are printed simultaneously in separate tabs of the same Playwright browser instance). External resources are cached in their raw format to speed up subsequent runs, and to avoid being rate-limited or blocked by those providers.
Use ./run.sh
with flags to run specific steps or tasks in this repo:
Flag | Description |
---|---|
--install |
Install packages and dependencies |
--ingest |
Run "ingest" pipeline step |
--print |
Run "print" pipeline step |
no flag | Run pipeline steps in order |
--app |
Run webapp in dev mode |
--test |
Run all tests (type-checking, linting/formatting checks, etc.) |
--lint |
Auto-fix linting/formatting |