Closed rouille closed 8 months ago
I have implemented the warning in the clean_cems
function of the data_cleaning
module. This is what we get for 2022:
2024-03-28 16:38:12 [WARNING] oge.oge.data_cleaning:756 Global extreme detected in CENS time series
2024-03-28 16:38:12 [WARNING] oge.oge.data_cleaning:757
gross_generation_mwh fuel_consumed_mmbtu co2_mass_lb ba_code
GLOBAL_EXTREME MEAN_DEVIATION GLOBAL_EXTREME MEAN_DEVIATION GLOBAL_EXTREME MEAN_DEVIATION
plant_id_eia emissions_unit_id_epa
315 3 350.0 11.993077 NaN NaN NaN NaN CISO
4 297.0 12.050640 NaN NaN NaN NaN CISO
356 5 55.0 11.488485 NaN NaN NaN NaN CISO
6 122.0 11.484192 NaN NaN NaN NaN CISO
377 5 NaN NaN 113.0 10.634884 157.0 10.653220 LDWP
563 14B NaN NaN 4.0 15.812850 4.0 16.375000 ISNE
673 S-3 NaN NaN 6.0 62.849096 6.0 66.123334 FMPP
874 5 NaN NaN 44.0 37.872499 44.0 37.716477 PJM
1554 1 NaN NaN 14.0 12.123729 28.0 13.579799 PJM
4 NaN NaN 12.0 14.239448 12.0 19.221903 PJM
1571 SMECO NaN NaN 5.0 19.709896 5.0 14.359615 PJM
1702 A NaN NaN 156.0 30.646180 154.0 30.158810 MISO
B NaN NaN 343.0 31.068923 342.0 31.068300 MISO
2049 CTA NaN NaN 3.0 10.609770 NaN NaN SOCO
CTB NaN NaN 3.0 10.620860 NaN NaN SOCO
2081 11 17.0 13.450980 8.0 13.671393 NaN NaN SWPP
12 7.0 12.000000 1.0 14.995356 NaN NaN SWPP
13 6.0 14.722222 NaN NaN NaN NaN SWPP
14 18.0 12.185185 6.0 12.332316 NaN NaN SWPP
15 15.0 13.177778 4.0 11.454545 NaN NaN SWPP
16 11.0 13.787879 5.0 12.125126 NaN NaN SWPP
17 25.0 14.773333 10.0 11.752003 NaN NaN SWPP
18 25.0 14.333333 10.0 12.957722 NaN NaN SWPP
2499 CT01-1 NaN NaN 2.0 17.588460 2.0 17.595678 NYIS
CT01-2 NaN NaN 1.0 10.157670 1.0 10.151316 NYIS
CT01-3 NaN NaN 1.0 10.157670 1.0 10.151316 NYIS
CT01-8 NaN NaN 3.0 16.067152 3.0 15.833333 NYIS
CT02-1 NaN NaN 1.0 10.579541 1.0 10.586419 NYIS
CT02-2 NaN NaN 1.0 12.369623 1.0 12.340909 NYIS
CT02-3 NaN NaN 1.0 10.721941 1.0 10.717105 NYIS
CT02-4 NaN NaN 1.0 10.457506 1.0 10.465116 NYIS
CT02-5 NaN NaN 3.0 15.354239 3.0 15.368216 NYIS
CT02-6 NaN NaN 1.0 11.991748 1.0 11.942623 NYIS
CT02-7 NaN NaN 1.0 10.579541 1.0 10.586419 NYIS
CT02-8 NaN NaN 3.0 16.818250 3.0 16.777778 NYIS
2632 1 NaN NaN 5.0 10.900407 5.0 10.881579 NYIS
3161 3 NaN NaN NaN NaN 2.0 10.308270 PJM
3403 GCT2 2.0 10.500000 1.0 10.998761 1.0 10.916667 TVA
3576 BW2 NaN NaN 139.0 14.145552 139.0 14.047807 ERCO
BW3 NaN NaN 88.0 16.065703 90.0 16.132975 ERCO
3809 3 NaN NaN 94.0 22.985355 87.0 14.391267 PJM
3992 9 NaN NaN 5.0 10.434699 5.0 10.437838 MISO
4266 4 507.0 24.520710 41.0 10.580426 43.0 10.614618 ERCO
5 582.0 25.776632 140.0 12.498052 140.0 12.451361 ERCO
6042 PMT1 NaN NaN 3.0 11.472898 3.0 11.566540 FPL
6074 1 8.0 10.100000 2.0 10.297569 NaN NaN SWPP
2 1.0 10.500000 1.0 12.668601 NaN NaN SWPP
3 1.0 10.000000 1.0 10.001303 NaN NaN SWPP
4 1.0 10.833333 1.0 10.099935 NaN NaN SWPP
6085 16A 2.0 17.166667 NaN NaN NaN NaN MISO
6651 CT01 NaN NaN 6.0 26.919475 NaN NaN MISO
6824 1 11.0 17.363636 17.0 25.641679 NaN NaN MISO
2 12.0 16.583333 17.0 25.231016 NaN NaN MISO
7425 1 NaN NaN 9.0 17.058380 NaN NaN MISO
10350 CTGB NaN NaN 7.0 16.728111 7.0 16.894603 CISO
50732 ETBLR1 NaN NaN 33.0 25.215167 NaN NaN PJM
ETBLR2 NaN NaN 24.0 33.637866 NaN NaN PJM
ETBLR3 NaN NaN 25.0 33.581149 NaN NaN PJM
55096 BLR2 NaN NaN 4.0 14.014076 NaN NaN MISO
55381 CT-005 NaN NaN 119.0 13.069302 119.0 13.042396 PJM
55419 600 NaN NaN NaN NaN 22.0 28.393516 MISO
700 NaN NaN NaN NaN 29.0 33.363435 MISO
800 NaN NaN NaN NaN 28.0 33.987326 MISO
Purpose
Screen timeseries for anomalous value following the algorithm steps described in Tyler H. Ruggles et al. Developing reliable hourly electricity demand data through screening and imputation (2020). Closes CAR-1882.
Note that the screening algorithms have been developed for demand time series and some of these algorithms might not be tailored for generation/emission time series.
The screening is conducted in 2 steps. Step 1 removes the most egregious anomalies where few or no calculations are needed. Afterward, in Step 2, the most extreme values have been removed making calculations of local characteristics of the data more reasonable. Through this screening process hourly values can be re-categorized from okay to other classifications based on the algorithms.
What the code is doing
Implement the screening algorithms using a notebook provided by the authors here. Algorithms from the first step are enclosed in the
AnomalyScreeningFirstStep
class. A second class,AnomalyScreeningSecondStep
, inherits fromAnomalyScreeningFirstStep
and perform 2/4 algorithms of the second step on top of the first one.Testing
Manually. See example below.
Where to look
Everything is in the
oge.data_cleaning
module.Usage Example/Visuals
Looking at a specific unit:
Review estimate
30min
Future work
Implement the single sided delta and anomalous region filters (see filter 3 and 4 of second step)
Checklist
black