Closed grgmiller closed 6 months ago
Updates since last review: (in general, these are aimed at reducing the number of missing values in the output data)
data_cleaning.py
inventory_input_data_sources
. When we identify missing months of data, we want to know whether we introduced that missing data, or if the data was already missing in the input data sources. This function identifies, for each plant-month, whether there was any data in CEMS or the EIA-923 generation and fuel table (the two definitive sources of generation and fuel consumption data) for that plant-month. It also indicates whether there was any non-zero input data for that plant month. This table is outputted to the outputs folder. remove_cems_with_zero_monthly_data()
.output_data.py
validation.py
Purpose
This PR updates the OGE pipeline to work with 2022 data, and also updates the manual tables.
What the code is doing
Updates default years to 2022 (Fixes CAR-3399) Updates source of eGRID2020 data to the v2 file (not used in the pipeline, only for comparison)
Updates reference tables (see https://github.com/singularity-energy/open-grid-emissions/issues/260) (Fixes CAR-3349)
ba_reference
: no update to the FERC table, new retirements of GLHB and GRIF according to EIAdefault_gross_to_net_ratios.csv
: No updateseGRID2020_crosswalk_of_EIA_ID_to_EPA_ID.csv
: updated based on eGRID2021. One new plant added to listemission_factors_for_co2_ch4_n2o
: No updates to AP-42 or IPCCemission_factors_for_nox
: No updates to AP-42 or IPCC. Added several new factors for boiler configurations that were not previously added, but are in the 2022 data.emission_factors_for_so2
: No updates to AP-42 or IPCC, Added several new factors for boiler configurations that were not previously added, but are in the 2022 data.energy_source_groups
: no changes based on pudl metadataepa_eia_crosswalk_manual
: used notebook to identify new additions to tablegeothermal_emission_factors
: no changes to source dataipcc_gwp
: most current report is still AR6physical_ba
plants_not_connected_to_grid
: no changes based on eGRID2021steam_units_to_remove
: Not updated since steam units no longer being removed.updated_oth_energy_source_codes
: ran notebook, no new matches neededutility_name_ba_code_map
: ran notebook, added several new maps, sorted alphabetically.load_data.py
epa_eia_crosswalk_manual
would not be reflected. Now, whenever cems data is being loaded, we runupdate_epa_to_eia_map()
to update theplant_id_eia
codesdata_cleaning.py
steam_units_to_remove
manual table, which didn't seem worth it if we are not going to be dropping these units in the future.pudl.analysis.allocate_gen_fuel_by_generator_energy_source()
instead of loading the table from pudl. This allows us to see data quality warnings generated during running that part of the pipeline which will help with data quality checks.eia930.py
emissions.py
annual_avg_fuel_sulfur_content
. Now, in the case that there is no sulfur content data available for a fuel in a year, the pipeline will check the reported sulfur contents from the previous year to see if it can fill in the annual average value. This means for the 2022 pipeline, the JF generators without a specified sulfur content will use the average JF sulfur content from 2021 (looking at multiple years, this sulfur content does not seem to change from year to year so this seems like a reasonable backstop). In implementing this fix, I split an existing function into two components.validation.py
test_for_negative_values
check does not passcheck_for_complete_timeseries
check tocheck_for_complete_hourly_timeseries
, and adds acheck_for_complete_monthly_timeseries
check. This new check ensures that monthly-resolution data contains all 12 months of data.Testing
Running the pipeline for 2022 (not yet complete)
Usage Example/Visuals
How the code can be used and/or images of any graphs, tables or other visuals (not always applicable).
Review estimate
How long will it take for reviewers and observers to understand this code change?
Future work
The following warnings were raised when running the pipeline:
Checklist
ruff