Validate Partial CEMS methodology for subplants

During peer-review of the hourly shaping methodology, a question was raised about whether the assumption behind the partial cems shaping methodology is robust (namely that all units within a subplant have similar operational profiles, and that all subplants within a plant have similar operational profiles).

For example:

what if individual units are fired up sequentially instead of in parallel? (i.e. one unit is ramped up until it reaches max capacity, then the next unit is fired up and varies output while the first unit remains operating at max capacity). [In this case, could we maybe use information about the unit's capacity factor to help inform this approach? If one unit has a high CF and the other has a low CF, then perhaps this assumption is not good, while if the CF are similar, maybe they are being operated in parallel. Or if one unit is significantly older/less efficient than the other it may operate differently]
Maybe this assumption is good for certain types of units (like combined cycle subplants) but not others.
When scheduled maintenance happens, it may be the case that the individual units are rotated out of service one at a time so that there is always at least one unit in operation.
Different subplants may have very different purposes - e.g. if one is a nuclear generator and one is a backup diesel generator - and would not be expected to have similar operational profiles. [we started to address this in https://github.com/singularity-energy/open-grid-emissions/pull/238]

On the greg/research branch I have started to explore this issue by examining the correlation coefficient between the hourly fuel consumption profile across units within each subplant, and across subplants within each plant. This preliminary examination revealed that for 2020, the mean correlation coefficient of the hourly fuel consumption of different units within a subplant is 0.67, and for subplants within a plant is 0.39. This suggests that perhaps the partial_cems_plant shaping method is less robust.

However, this raises the question of whether the fleet-average residual profile more accurately reflects the profile of that missing generation than the plant-specific average generation profile? In other words, even if this assumption isn't great, are the alternatives any better? I'm not sure that we could validate that. Maybe one approach to try and cross validate would be to ask how well correlated the generation profile of each specific subplant is with the fleet average generation profile of all subplants that report to CEMS? i.e. is the average representative of each unique unit?

singularity-energy / open-grid-emissions

Validate Partial CEMS methodology for subplants #247