jdechalendar / gridemissions

Tools for power sector emissions tracking
MIT License
36 stars 6 forks source link

Update docs on different types of datasets #40

Closed jdechalendar closed 3 months ago

jdechalendar commented 3 months ago

Two types of datasets are made available through this project 1) live and 2) historical (or “bulk”).

  1. The “live” datasets are used to make the maps here [ADD LINK]. These currently include one month’s worth of data. These datasets are updated on an hourly basis using a cron job. ADD details on script used to do this.
  2. The “bulk” datasets have all the history and are more infrequently updated. They are used to create the reports. For now, the historical dataset corresponds to the six month processed files, but they could be merged later. ADD details on how this is done (Makefile)

Steps to create the bulk dataset

  1. make bulk
  2. upload bulk files to EC2 instance (e.g., via scp processed.tar.gz ge-priv:/data/ec2-user/EIA_Grid_Monitor/.)
  3. make bulk_upload to upload the dataset to the S3 bucket
  4. make bulk_report to make the automated reports and upload them to S3 bucket
  5. Delete uncompressed version of the archive on the remote to save space