singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
72 stars 5 forks source link

Missing data due to fuel code mismatch between EIA-860 and EIA-923 #271

Closed grgmiller closed 1 year ago

grgmiller commented 1 year ago

We've noticed large fluctuations in the fleet emission rate for petroleum in MISO across each of the years. This issue seems to have existed prior v0.2.0, but we never noticed it before.

2019: 14,726,036,229.73 lbCO2 / 566,287 MWh = 25,928.32 lb/MWh 2020: 18,206,127,107.72 lbCO2 / 7,427,800 MWh = 2,451.08 lb/MWh 2021: 22,809,325,967 / 28,720,749 = 794 lb/MWh

Steps:

My guess would be that the source of this issue is most likely to be a result of:

grgmiller commented 1 year ago

So after digging into this in more detail, this specific issue appears to result from a specific plant, and may be a symptom of several broader issues.

As we showed above, in 2019, total petroleum fleet emissions were calculated as ~14.7 billion lb. Of this total, approximately 14 billion lb comes from a single plant ("Columbia", id 8023). There are a couple of issues with this:

This suggests that there are two big issues:

  1. There is something wrong with our plant primary fuel identification
  2. Our gross to net generation conversion is "accurate" in that it is converting 5.8 million MWh to 3,000 MWh, but we should not be allowing this when there is that big of a discrepancy in gross to net. This is a situation where the algorithm should be defaulting to a more reasonable default gross to net ratio.

What else do we know about what's going on?

I'm going to poke around with this plant and see what I can find out.

grgmiller commented 1 year ago

So it looks like the issue may be with the pudl.analysis.allocate_net_gen.allocate_gen_fuel_by_generator_energy_source() module. It appears that it is dropping some data.

Here's what the raw EIA-923 data says is the generation and fuel consumption of plant 8023 in 2019: image

And here's what the output of the generation fuel allocation is: image

It looks like maybe the issue is with the RC fuel code getting dropped or not merging correctly.

To be continued...

grgmiller commented 1 year ago

After further investigation, it looks like the issue is partially in the raw EIA-860 data, and partially with our pudl allocate_net_gen function.

Although for plant 8023, EIA-923 reports most of the annual fuel consumption in 2019 associated with the RC energy source code, in EIA-860, none of the energy source codes associated with this plant are RC (in fact, in EIA-860, both energy_source_code_1 and energy_source_code_2 are SUB which seems strange...). This means that when our allocate_net_gen function is using the energy_source_code as a merge key, it is not finding RC in the gens table and thus dropping all of this data. I'm not sure how widespread this issue might be yet, but it seems like there may be a relatively simple patch we could implement in the allocate_net_gen function: for each plant-pm, we could check whether all of the energy source codes reported in the gf table also exist in the the gens table. If not, we could add any missing energy source codes to the gens table so that the merge doesn't drop this data (although if we are doing an "outer" merge, should this matter?)

grgmiller commented 1 year ago

After investigating further, in 2019, this bug affected 49 individual plants and is leading to up to ~52,000,000 MWh and 528,000,000 mmbtu of fuel (primary coal and petroleum) being dropped from the dataset. The cause of this is that a fuel code that exists in the EIA-923 generation and fuel table is not listed as one of the fuel codes in the EIA-860 generator table, and we did not previously catch this in the pudl allocate_net_gen code.