This PR improves and automates the validation of OGE output data compared to eGRID data (fixes CAR-1893, CAR-1894, CAR-1892, CAR-1887).
Updates:
update download_data.download_egrid_file() to only download the single eGRID file for the year for which the pipeline is being run. Previously, we had downloaded all eGRID files 2018-2021. Note: until we start expanding historical coverage, I am not planning to add urls for eGRID versions prior to 2018.
Moves all of the egrid validation functions from src.validation.py to a new file src.validate_egrid.py - we might want to name this something different?
Creates a new constants.py file to hold constants referenced across the repo (prevents circular imports)
Started work on a new automated logic to identify why discrepancies exist between eGRID data and OGE data
To do:
[ ] check that all eGRID column mapping / loading is correct
[ ] Calculate and/or load _for_electricity and _for_electricity_adjusted values
[ ] Make dataframe filters more legible when filtering plant data errors
[ ] When there is a mismatched set of plants, examine how much of difference results from mismatch, then if there is remaining difference for matched set of plants
[ ] When flagging issues, tag how many plants have that issue so that we make sure we are capturing all issues and there are no problem plants that are uncategorized.
[ ] Consider changing plant comparison status to a numeric percent instead of text category (or have both).
Where to pick up work next time (3/11/23): started transferring code to a fresh notebook to put it all in order. Need to re-build the plant comparison df
This PR improves and automates the validation of OGE output data compared to eGRID data (fixes CAR-1893, CAR-1894, CAR-1892, CAR-1887).
Updates:
download_data.download_egrid_file()
to only download the single eGRID file for the year for which the pipeline is being run. Previously, we had downloaded all eGRID files 2018-2021. Note: until we start expanding historical coverage, I am not planning to add urls for eGRID versions prior to 2018.src.validation.py
to a new filesrc.validate_egrid.py
- we might want to name this something different?constants.py
file to hold constants referenced across the repo (prevents circular imports)To do:
Where to pick up work next time (3/11/23): started transferring code to a fresh notebook to put it all in order. Need to re-build the plant comparison df