This PR addresses an issue where we were noticing anomalous net generation data for certain generators in 2005. Specifically, the generation was several orders of magnitude greater than the reported gross generation or net generation. This was a result of high backstop values and a lack of validation of these values. Closes CAR-4342
To fix this specific issue, I updated how the annual_fuel_ratio (now renamed annual_fleet_ratio) is calculated. Previously, we were just taking the mean GTN value for all plants that had a specific energy_source_code as the primary fuel. However, this approach was subject to outliers. We now filter anomalous subplants out of the data first, and then calculate a weighted average value across all subplants. Instead of using the specific energy_source_code, we now calculate fleet ratios for fuel_category, prime_mover_code combinations. Including the prime_mover_code is more consistent with the EIA backstop/default GTN values, which are organized by prime mover.
While digging into this, I also took the opportunity to address other issues with the GTN methodology.
This PR also removes the "shift_factor" methods from the method hierarchy. These approaches were only used by a small portion of the data, and their methodology is less robust than the ratio method.
I also addressed an issue where prime_mover_codes were not being added to all generators, meaning that default factors could not be added for some generators.
Another issue was that some data was being summed for subplant-months with only EIA data or only CEMS data, but not both, which was resulting in abnormally high or low factors. I added in extra filtering to ensure that we are only including rows that contain both CEMS and EIA data when calculating GTN ratios.
I also updated the documentation to better describe the current GTN methodology.
Ran the GTN pipeline for 2022, and ran the entire pipeline for 2005 data.
Comparing the old 2005-2006 data for MISO:
Previous carbon intensity: 1,530 lb/MWh (2005), 1,650 lb/MWh (2006)
After GTN fix: 1,717 lb/MWh (2005), 1,691 lb/MWh
Where to look
It's helpful to clarify where your new code lives if you moved files around or there could be confusion/
What files are most important?
Usage Example/Visuals
How the code can be used and/or images of any graphs, tables or other visuals (not always applicable).
Review estimate
How long will it take for reviewers and observers to understand this code change?
Future work
What issues were identified that are not being addressed in this PR but should be addressed in future work?
Checklist
[x] Update the documentation to reflect changes made in this PR
[x] Format all updated python files using black
[x] Clear outputs from all notebooks modified
[x] Add docstrings and type hints to any new functions created
Purpose
This PR addresses an issue where we were noticing anomalous net generation data for certain generators in 2005. Specifically, the generation was several orders of magnitude greater than the reported gross generation or net generation. This was a result of high backstop values and a lack of validation of these values. Closes CAR-4342
To fix this specific issue, I updated how the
annual_fuel_ratio
(now renamedannual_fleet_ratio
) is calculated. Previously, we were just taking the mean GTN value for all plants that had a specific energy_source_code as the primary fuel. However, this approach was subject to outliers. We now filter anomalous subplants out of the data first, and then calculate a weighted average value across all subplants. Instead of using the specific energy_source_code, we now calculate fleet ratios for fuel_category, prime_mover_code combinations. Including the prime_mover_code is more consistent with the EIA backstop/default GTN values, which are organized by prime mover.While digging into this, I also took the opportunity to address other issues with the GTN methodology.
Note: This PR also fixes an issue not addressed in https://github.com/singularity-energy/open-grid-emissions/pull/369 about setting the datetime dtype for columns that were already tzaware datetimes.
Testing
Ran the GTN pipeline for 2022, and ran the entire pipeline for 2005 data.
Comparing the old 2005-2006 data for MISO: Previous carbon intensity: 1,530 lb/MWh (2005), 1,650 lb/MWh (2006) After GTN fix: 1,717 lb/MWh (2005), 1,691 lb/MWh
Where to look
It's helpful to clarify where your new code lives if you moved files around or there could be confusion/
What files are most important?
Usage Example/Visuals
How the code can be used and/or images of any graphs, tables or other visuals (not always applicable).
Review estimate
How long will it take for reviewers and observers to understand this code change?
Future work
What issues were identified that are not being addressed in this PR but should be addressed in future work?
Checklist
black