This PR updates OGE to use pudl v2024.5.0 release data. This primarily involves updating table names.
Note, that we are still using an older version of the pudl dependency, but since all of the column names appear to be the same, this is not causing any errors for now. However, in a future PR (see future work section), we should strive to just completely remove the pudl code dependency from OGE and rely entirely on the output data. I did start to remove some dependencies where it was easy, and added some notes for the future:
In emissions.py: we import a specific function pudl.analysis.allocate_gen_fuel.distribute_annually_reported_data_to_months_if_annual(), which takes data with a single annual report date, and expands it to each month in that year. However, after examining what this function is actually doing, the new pudl table already has this functionality included, so we do not need to run it again.
In load_data.py: we import from pudl.metadata.fields import apply_pudl_dtypes. We use this to apply the pudl-defined dtypes to each column of data loaded from the pudl sqlite database. To remove this, I've updated column_checks to include a dictionary of pudl dtypes that we apply
Closes CAR-4361.
What the code is doing
Testing
Ran pipeline successfully for 2018.
Ran pipeline for 2019-2022, including download of new data source.
Where to look
N/A
Review estimate
10 min
Future work
In a future PR, we should see if we can completely remove the PUDL dependency from OGE. We may be able to do this using the new tables that are available. The current dependencies are:
In subplant_identification.py:
We from pudl.etl.glue_assets import make_subplant_ids: we use this to run our own pipeline for creating subplant_ids. Replacing this would likely require us to integrate our subplant pipeline into pudl.
We import pudl.analysis.epacamd_eia as epacamd_eia, we use this to filter_crosswalk(), again as part of creating our own subplant_ids
Purpose
This PR updates OGE to use pudl v2024.5.0 release data. This primarily involves updating table names.
Note, that we are still using an older version of the pudl dependency, but since all of the column names appear to be the same, this is not causing any errors for now. However, in a future PR (see future work section), we should strive to just completely remove the pudl code dependency from OGE and rely entirely on the output data. I did start to remove some dependencies where it was easy, and added some notes for the future:
emissions.py
: we import a specific functionpudl.analysis.allocate_gen_fuel.distribute_annually_reported_data_to_months_if_annual()
, which takes data with a single annual report date, and expands it to each month in that year. However, after examining what this function is actually doing, the new pudl table already has this functionality included, so we do not need to run it again.load_data.py
: we importfrom pudl.metadata.fields import apply_pudl_dtypes
. We use this to apply the pudl-defined dtypes to each column of data loaded from the pudl sqlite database. To remove this, I've updatedcolumn_checks
to include a dictionary of pudl dtypes that we applyCloses CAR-4361.
What the code is doing
Testing
Ran pipeline successfully for 2018. Ran pipeline for 2019-2022, including download of new data source.
Where to look
N/A
Review estimate
10 min
Future work
In a future PR, we should see if we can completely remove the PUDL dependency from OGE. We may be able to do this using the new tables that are available. The current dependencies are:
subplant_identification.py
:from pudl.etl.glue_assets import make_subplant_ids
: we use this to run our own pipeline for creating subplant_ids. Replacing this would likely require us to integrate our subplant pipeline into pudl.import pudl.analysis.epacamd_eia as epacamd_eia
, we use this tofilter_crosswalk()
, again as part of creating our own subplant_idsdata_cleaning.py
: weimport pudl.analysis.allocate_gen_fuel as allocate_gen_fuel
to run our own allocation. We could easily replace this but we have some code changes in our version of pudl that we'd need to merge. See changes in https://github.com/catalyst-cooperative/pudl/compare/main...singularity-energy:pudl:oge_release primarily from https://github.com/singularity-energy/pudl/pull/1 and https://github.com/singularity-energy/pudl/pull/3We should also switch to using the pudl emissions control data instead of
load_data.load_emissions_controls_eia923()
once https://github.com/catalyst-cooperative/pudl/issues/3689 is completed.Checklist
ruff