name: Make db and parquet
on: [push] # nah, do this daily
jobs:
build_and_:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: pip install and run python script
run: |
pip install -r requirements.in
python ./fetch_data_from_balancing_authorities.py
- name: Upload generated data file
uses: actions/upload-artifact@v3
with:
name: hourly-co2-usa.ztd.parquet
path: output/hourly-co2-usa.ztd.parquet
Until we know where we want to make data available to download from, it's probably best to have the github action upload the just the generated parquet file, as with 83 balancing authorities, I'm guessing that the generated sqlite database file would be between 500 and 800 gb uncompressed.
We now have a script that generates a sqlite file, that you can easilty browse using datasette, and also a parquet file of all the readings.
Running this daily would be v helpful.
You can upload data as part of a github action to a local repo, until you have figured out where to put it on a more permanent basis.
https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts
Until we know where we want to make data available to download from, it's probably best to have the github action upload the just the generated parquet file, as with 83 balancing authorities, I'm guessing that the generated sqlite database file would be between 500 and 800 gb uncompressed.