singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
67 stars 4 forks source link

Ensure complete `subplant_id` mapping #49

Open grgmiller opened 2 years ago

grgmiller commented 2 years ago

Currently, subplant IDs are only created for units that exist both in CEMS and EIA-923, meaning that there are certain generators/units that have a subplant ID of NaN.

grgmiller commented 2 years ago

I should check whether subplant ids are used at all in the clean_eia923() function, but adding subplant ids for EIA-only data might become irrelevant (at least for the initial public release) if we are grouping this data by BA-fuel anyway.

grgmiller commented 1 year ago

It appears that certain plants/generators that exist in both CEMS EIA-860 are missing from the crosswalk.

One reason for this might be that we currently inner join the CEMS ids with EIA ids from EIA-923 and not EIA-860, but it is possible that EIA-860 is more complete.

gailin-p commented 1 year ago

One example of a missing plant is plant_id_eia=2379, which has two generators according to EIA-860 (CA1 and CA2).

grgmiller commented 1 year ago

So at least part of the issue was that when we were filtering the CEMS data using the EPA crosswalk, certain units were being dropped because of a mismatch in unitid: In the CEMS data, we had stripped leading zeros from the id, but in the crosswalk, we did not, which was leading to those plants being dropped. I've now fixed that issue.

grgmiller commented 1 year ago

Maybe we can get this fixed in PUDL: https://github.com/catalyst-cooperative/pudl/issues/1769 It also looks like EPA is getting ready to release a new version of the crosswalk, which may improve the coverage for subplant mapping: https://github.com/USEPA/camd-eia-crosswalk/pull/25#issuecomment-1190243145

gailin-p commented 1 year ago

Fuel category differences within subplants with subplant_id=NaN

In some cases, generators in a single plant missing from subplant_crosswalk have a mix of renewable and fossil fuel types. This occurs in 74 subplant-months in plants 141, 621, 1943, 2240, 10025, 10823, and 58236. In these cases, all generators in the plant which are not in subplant_crosswalk are assigned the same subplant, subplant_id=NaN.

In #230, we propsoed that subplants within a plant should not share the same CEMS profile (hourly shaping method partial_cems_plant) when they have different primary fuel types, since this resulted in one case where all nuclear generation from a large nuclear power plant plant_id_eia=2410 was being assigned to the 3 hours where a backup diesel generator was on and reporting to CEMS. However, because renewable and fossil generators are combined in each of the subplants listed above, the renewable and fossil generators cannot be assigned different profiles.

If the renewable and fossil generators were assigned different subplants, we could safely use partial_cems_plant to shape the subplant with the fossil generators and a residual profile method to shape the subplant with the renewable generators. This would be conceptually more correct than choosing one method to apply to a sublant with mixed fossil and renewable generation.

To fix this, we would need to update subplant crosswalk (see @grgmiller 's comments above, we could potentially do this in PUDL) to assign different subplant IDs to generators within a plant whose fuel types differ.

gailin-p commented 1 year ago

adding subplant ids for EIA-only data might become irrelevant (at least for the initial public release) if we are grouping this data by BA-fuel anyway.

Since hourly data is shaped at the subplant level, I think this does end up affecting currently released data.

grgmiller commented 1 year ago

I think that one way to fix this issue would be to take advantage of the existing unit_id_pudl identifiers created by the pudl data pipeline (see the "Unit mapping through network analysis" section of this blog post for more information). These unit_id_pudl are created using the same network analysis that is used for the subplant_id mapping, but only based on EIA data. However, in order to use these unit_id_pudl alongside the subplant_id, the two would likely need to be harmonized (or potentially just used as two separate keys). See https://github.com/catalyst-cooperative/pudl/issues/1769 for more background on this harmonization issue.

grgmiller commented 1 year ago

As noted in https://github.com/catalyst-cooperative/pudl/issues/1769#issuecomment-1261554433, I've actually noticed that the current subplant id mapping is not behaving as expected (mapping units to generators and boilers) because it ignores all of the boiler-generator associations.