singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
72 stars 5 forks source link

Expand historical coverage pre-2019 #295

Closed grgmiller closed 4 months ago

grgmiller commented 1 year ago

Summary

This PR updates the data pipeline to allow for the creation of historical data from 2013-2018. Because EIA-930 data is not available for a complete year prior to 2019, the data outputs prior to that year will be limited to the following:

Where to look

Most of the updates are in data_pipeline.py with minor changes in other files to update allowed year ranges, and update certain functions to accept an argument to specify different behavior based on whether hourly data is available or not.

Update details

Document in more depth the changes being made

Screenshots

A couple screenshots of the changes/data if relevant.

Testing / Validation

After running the 2018 pipeline, I noticed the following warnings that are not tripped in the more recent data:

Linear ticket

Closes CAR-2968, CAR-1823, CAR-4206

Concerns

Anything you'd like to point out that the reviewers should pay special attention to

Next steps / Not addressed here

The availability of certain input data prior to 2013 may be different so that will be addressed in a future PR.

Checklist

grgmiller commented 10 months ago

Picking this PR back up on 11/22/23 after months of inactivity. At this point, just merged the most recent development branch in and did a test run of the pipeline with a single year (2018) of data to see if it is working. I just wanted to sync this branch with the most recent changes before we started all of our other updates so it doesn't get too far out of sync, but will probably pick work back up on this after the 2022 data update is complete.

rouille commented 4 months ago

This PR is now part of larger group of PRs that aim to update the data pipeline to allow for the creation of historical data from 2005-2018. All the PRs created to the expansion of the historical coverage will be merged into the historical_coverage_feature feature branch.

This PR allow to run the pipeline without error from 2008 to 2018. Not that the warnings have not been investigated yet and the outputs have not been validated. This PR simply fixes errors encountered when running 2008 - 2018.

Next steps:

grgmiller commented 4 months ago

In addition to the next steps you listed above, it looks like we will need to figure out how to deal with the download_eia923() function since it will not work with some of the early data, and some of the functions that use those raw files may need alternative file handling in those earlier years.

grgmiller commented 4 months ago

Also, can you please test this to make sure that the results for say 2022 match the existing outputs? This change should theoretically not affect any of the results

rouille commented 4 months ago

In addition to the next steps you listed above, it looks like we will need to figure out how to deal with the download_eia923() function since it will not work with some of the early data, and some of the functions that use those raw files may need alternative file handling in those earlier years.

Indeed:

open-grid-emissions[~/Singularity/open-grid-emissions/src/oge] (historical_coverage) brdo$ python data_pipeline.py --year 2005
2024-05-14 15:26:48 [INFO] oge.data_pipeline:71 

Running with the following options:
  * year = 2005
  * shape_individual_plants = True
  * small = False
  * flat = False
  * skip_outputs = False

2024-05-14 15:26:48 [INFO] oge.data_pipeline:121 Running data pipeline for year 2005
2024-05-14 15:26:48 [WARNING] oge.oge.validation:32 
        ################################################################################
        The data pipeline has only been validated to work for years 2019-2022.
        Running the pipeline for 2005 may cause it to fail or may lead to poor-quality
        or anomalous results. To check on the progress of validating additional years of
        data, see: https://github.com/singularity-energy/open-grid-emissions/issues/117
        ################################################################################

2024-05-14 15:26:48 [INFO] oge.data_pipeline:126 1. Downloading data
2024-05-14 15:26:48 [INFO] oge.oge.download_data:126 Using nightly build version of PUDL sqlite database downloaded 2024-04-03
2024-05-14 15:26:48 [INFO] oge.oge.download_data:147 Using nightly build version of PUDL epacems parquet file downloaded 2024-04-03
2024-05-14 15:26:48 [INFO] oge.oge.download_data:45 egrid2018_data.xlsx already downloaded, skipping.
2024-05-14 15:26:48 [INFO] oge.oge.download_data:45 egrid2019_data.xlsx already downloaded, skipping.
2024-05-14 15:26:48 [INFO] oge.oge.download_data:45 egrid2020_data.xlsx already downloaded, skipping.
2024-05-14 15:26:48 [INFO] oge.oge.download_data:45 egrid2021_data.xlsx already downloaded, skipping.
2024-05-14 15:26:48 [INFO] oge.oge.download_data:45 egrid2022_data.xlsx already downloaded, skipping.
2024-05-14 15:26:48 [INFO] oge.oge.download_data:45 epa_eia_crosswalk.csv already downloaded, skipping.
2024-05-14 15:26:48 [INFO] oge.oge.download_data:45 eia8602005 already downloaded, skipping.
2024-05-14 15:26:48 [INFO] oge.oge.download_data:45 eia8602022 already downloaded, skipping.
Traceback (most recent call last):
  File "/Users/brdo/Singularity/open-grid-emissions/src/oge/data_pipeline.py", line 656, in <module>
    main(sys.argv[1:])
  File "/Users/brdo/Singularity/open-grid-emissions/src/oge/data_pipeline.py", line 147, in main
    download_data.download_raw_eia923(year)
  File "/Users/brdo/Singularity/open-grid-emissions/src/oge/download_data.py", line 302, in download_raw_eia923
    raise NotImplementedError(f"EIA-923 data is unavailable for '{year}'.")
NotImplementedError: EIA-923 data is unavailable for '2005'.
grgmiller commented 4 months ago

I added one comment with a suggested name change, otherwise this looks good to merge once we confirm that this is not modifying the 2022 outputs.

grgmiller commented 4 months ago

@rouille Looks good to me - ready to merge!