singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
67 stars 4 forks source link

Identify outlier values in reported CEMS data #50

Open grgmiller opened 2 years ago

grgmiller commented 2 years ago

We should implement some sort of outlier detection and screening for the hourly values reported in CEMS. This outlier detection could use a combination of statistical methods and physics-based methods (e.g. gross generation should not exceed nameplate capacity).

This should probably be implemented after loading the CEMS data but before any missing data imputation steps.

gailin-p commented 2 years ago

This may be the source of spikiness in the output data (below), so may be a priority for V1.

Blue is our result, red is the raw 930 profile (after timestamp adjustment). Both show total generation in PJM. The spikes in the blue profile are due to spikes in CEMS data.

pjm_930_comparison

grgmiller commented 2 years ago

Can you provide a little more context of what the above graph is showing?

gailin-p commented 2 years ago

Added above.

Two possible approaches:

grgmiller commented 1 year ago

This paper includes some possible approaches to identifying outlier data: https://pubs.acs.org/doi/10.1021/acs.est.9b04522

https://github.com/NREL/NaTGenPD