singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
67 stars 4 forks source link

Access OGE outputs from Amazon S3 #338

Closed rouille closed 5 months ago

rouille commented 5 months ago

Purpose

Allow users to read OGE output data from Amazon S3. This is particularly useful when we want to use OGE outputs in a separate project. Closes CAR-3681

What the code is doing

Create a function setting the OGE data store. It is looking for an OGE_DATA_STORE environment variable. If it does not exist the data store is set to local and it will write/read data to/from the open_grud_emissions_data folder located in the users' $HOME (current behavior)

Testing

The feature has been tested setting the OGE_DATA_DIR environment variable to s3 in a project importing the oge package. A file stored on Amazon S3 was then successfully loaded using the pandas' read_csv function

Where to look

Usage Example/Visuals

Setting the OGE_DATA_STORE environment variable

(open-grid-emissions) [~/Singularity/open-grid-emissions] (ben/store) brdo$ python
Python 3.11.2 (main, Nov  1 2023, 11:27:45) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ["OGE_DATA_STORE"] = "s3"
>>> from oge.filepaths import data_folder
>>> data_folder()
's3://open-grid-emissions/open_grid_emissions_data/'
>>> 

Not setting it:

(open-grid-emissions) [~/Singularity/open-grid-emissions] (ben/store) brdo$ python
Python 3.11.2 (main, Nov  1 2023, 11:27:45) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from oge.filepaths import data_folder
>>> data_folder()
'/Users/brdo/open_grid_emissions_data/'
>>> 

Trying to run pipeline with the OGE_DATA_STORE set to s3 raises an OSError

(open-grid-emissions) [~/Singularity/open-grid-emissions] (ben/store) brdo$ export OGE_DATA_STORE=2
(open-grid-emissions) [~/Singularity/open-grid-emissions] (ben/store) brdo$ echo $OGE_DATA_STORE 
2
(open-grid-emissions) [~/Singularity/open-grid-emissions] (ben/store) brdo$ python src/oge/data_pipeline.py --year 2020
Traceback (most recent call last):
  File "/Users/brdo/Singularity/open-grid-emissions/src/oge/data_pipeline.py", line 642, in <module>
    main(sys.argv[1:])
  File "/Users/brdo/Singularity/open-grid-emissions/src/oge/data_pipeline.py", line 73, in main
    raise OSError("Invalid OGE_DATA_STORE environment variable. Should be 'local' or '1'")
OSError: Invalid OGE_DATA_STORE environment variable. Should be 'local' or '1'

Review estimate

15min

Future work

N/A

Checklist

grgmiller commented 5 months ago

Closes CAR-3681