singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
67 stars 4 forks source link

Implement data anomaly screening described in Ruggles et al. #349

Closed rouille closed 3 months ago

rouille commented 3 months ago

Purpose

Screen timeseries for anomalous value following the algorithm steps described in Tyler H. Ruggles et al. Developing reliable hourly electricity demand data through screening and imputation (2020). Closes CAR-1882.

Note that the screening algorithms have been developed for demand time series and some of these algorithms might not be tailored for generation/emission time series.

The screening is conducted in 2 steps. Step 1 removes the most egregious anomalies where few or no calculations are needed. Afterward, in Step 2, the most extreme values have been removed making calculations of local characteristics of the data more reasonable. Through this screening process hourly values can be re-categorized from okay to other classifications based on the algorithms.

What the code is doing

Implement the screening algorithms using a notebook provided by the authors here. Algorithms from the first step are enclosed in the AnomalyScreeningFirstStep class. A second class, AnomalyScreeningSecondStep, inherits from AnomalyScreeningFirstStep and perform 2/4 algorithms of the second step on top of the first one.

Testing

Manually. See example below.

Where to look

Everything is in the oge.data_cleaning module.

Usage Example/Visuals

>>> import pandas as pd
>>> 
>>> from oge.load_data import load_cems_data
>>> from oge.data_cleaning import AnomalyScreeningFirstStep, AnomalyScreeningSecondStep
>>> 
>>> 
>>> cems = load_cems_data(2022)
>>> 
>>> unit_screening = []
>>> for i in cems.groupby(["plant_id_eia", "emissions_unit_id_epa"]).groups.keys():
...     AS = AnomalyScreeningFirstStep(cems.query("plant_id_eia == @i[0] and emissions_unit_id_epa == @i[1]")[["plant_id_eia", "emissions_unit_id_epa", "co2_mass_lb"]], "co2_mass_lb")
...     AS.run_all_first_step()
...     unit_screening.append(pd.DataFrame(AS.get_filtered_df().groupby("category").size().to_dict(), index=[i]))
... 
>>> summary = pd.concat(unit_screening).fillna(0).astype("int")
>>> 
>>> summary.head(20)
            IDENTICAL_RUN  MISSING  OKAY  ZERO  GLOBAL_EXTREME  GLOBAL_EXTREME_+/-_1H
(3, 1)                360     7226  1174     0               0                      0
(3, 2)                294     7126  1339     1               0                      0
(3, 4)                 23     7327  1408     2               0                      0
(3, 5)                  5     2128  6618     9               0                      0
(3, 6A)               205      547  7997    11               0                      0
(3, 6B)                44      489  8227     0               0                      0
(3, 7A)               126      677  7952     5               0                      0
(3, 7B)                79      800  7881     0               0                      0
(7, 1)                  1     8659   100     0               0                      0
(7, 2)                  0     8760     0     0               0                      0
(9, CTG-1)              0     8760     0     0               0                      0
(10, 1)               919     2346  5493     2               0                      0
(10, 2)              1175     1598  5987     0               0                      0
(10, CT10)            183     8253   324     0               0                      0
(10, CT2)              14     8712    34     0               0                      0
(10, CT3)              56     8632    72     0               0                      0
(10, CT4)             103     8527   130     0               0                      0
(10, CT5)              36     8652    72     0               0                      0
(10, CT6)             100     8467   193     0               0                      0
(10, CT7)             140     8422   198     0               0                      0
>>> summary.query("GLOBAL_EXTREME != 0")
                 IDENTICAL_RUN  MISSING  OKAY  ZERO  GLOBAL_EXTREME  GLOBAL_EXTREME_+/-_1H
(563, 14B)                   0     8741     8     4               4                      3
(673, S-3)                   7     8725    11    13               4                      0
(874, 5)                   338     7869   508     0              43                      2
(1507, 3)                  303     8302   143     0              12                      0
(1507, 4)                  711     7774   257     0              17                      1
(1554, 1)                  321     7459   947     0              27                      6
(1571, SMECO)                1     8736    14     4               4                      1
(1702, A)                   86     7572   855    46             154                     47
(1702, B)                  100     7240   934    64             333                     89
(2499, CT01-1)              81     8553   121     2               2                      1
(2499, CT01-2)              72     8576   106     4               1                      1
(2499, CT01-3)              64     8593   100     1               1                      1
(2499, CT01-4)              64     8592    99     3               1                      1
(2499, CT01-8)              57     8604    93     1               3                      2
(2499, CT02-1)              92     8518   147     1               1                      1
(2499, CT02-2)              40     8620    97     1               1                      1
(2499, CT02-3)              66     8574   116     2               1                      1
(2499, CT02-4)             107     8495   154     2               1                      1
(2499, CT02-5)             129     8453   173     1               3                      1
(2499, CT02-6)              53     8614    87     4               1                      1
(2499, CT02-7)              78     8556   121     3               1                      1
(2499, CT02-8)              52     8613    86     4               3                      2
(2632, 1)                    0     8711    34     0               5                     10
(3403, GCT2)                 4     8740    14     0               1                      1
(3576, BW2)                 78     6255  2201     0             139                     87
(3576, BW3)                144     6768  1727     0              69                     52
(3809, 3)                  108     7971   359   230              84                      8
(3992, 9)                    6     8486   257     0               5                      6
(4195, 2)                  104     7601  1052     0               1                      2
(4266, 4)                   48     7512  1123     0              43                     34
(4266, 5)                   54     7110  1372     0             140                     84
(6042, PMT1)                 3     8626   126     0               3                      2
(10350, CTGB)               18     8647    70    20               3                      2
(55381, CT-005)             24     8418   180     0             107                     31
(55381, CT-006)             34     8445   161     0              85                     35
(55381, CT-007)             30     8462   147     0              84                     37
(55419, 600)                46      797  7878     0              22                     17
(55419, 700)                86      463  8166     0              29                     16
(55419, 800)               108       41  8569     0              28                     14
>>> AS = AnomalyScreeningSecondStep(cems.query("plant_id_eia == 55419 and emissions_unit_id_epa == '600'")[["plant_id_eia", "emissions_unit_id_epa", "co2_mass_lb"]], "co2_mass_lb")
>>> AS.summary()
2024-03-07 01:16:12 [INFO] oge.oge.data_cleaning:2265                        count  fraction
category                              
GLOBAL_EXTREME            22  0.002511
GLOBAL_EXTREME_+/-_1H     17  0.001941
IDENTICAL_RUN             46  0.005251
MISSING                  797  0.090982
OKAY                    7878  0.899315
>>> AS.flag_local_deviation_from_expected_value()
>>> AS.flag_deltas()
>>> 
>>> AS.summary()
2024-03-07 01:16:38 [INFO] oge.oge.data_cleaning:2265                        count  fraction
category                              
DOUBLE_DELTA             232  0.026484
GLOBAL_EXTREME            22  0.002511
GLOBAL_EXTREME_+/-_1H     17  0.001941
IDENTICAL_RUN             46  0.005251
LOCAL_DOWN               535  0.061073
LOCAL_UP                 147  0.016781
MISSING                  797  0.090982
OKAY                    6964  0.794977

Looking at a specific unit:

>>> check = cems.query("plant_id_eia == 55419 and emissions_unit_id_epa == '600'").set_index("datetime_utc")["co2_mass_lb"]
>>> check.min()
4736.000061035156
>>> check.max()
33493601.5625
>>> ax = check.plot(ylim=(1000, 500000))
>>> ax.set_xlabel("")
Text(0.5, 14.644444444444442, '')
>>> ax.set_ylabel("CO2 Emission (lb)")
Text(22.944444444444432, 0.5, 'CO2 Emission (lb)')
>>> plt.show()

example

Review estimate

30min

Future work

Implement the single sided delta and anomalous region filters (see filter 3 and 4 of second step)

Checklist

rouille commented 3 months ago

I have implemented the warning in the clean_cems function of the data_cleaning module. This is what we get for 2022:

2024-03-28 16:38:12 [WARNING] oge.oge.data_cleaning:756 Global extreme detected in CENS time series
2024-03-28 16:38:12 [WARNING] oge.oge.data_cleaning:757 
                                   gross_generation_mwh                fuel_consumed_mmbtu                   co2_mass_lb                ba_code
                                         GLOBAL_EXTREME MEAN_DEVIATION      GLOBAL_EXTREME MEAN_DEVIATION GLOBAL_EXTREME MEAN_DEVIATION        
plant_id_eia emissions_unit_id_epa                                                                                                             
315          3                                    350.0      11.993077                 NaN            NaN            NaN            NaN    CISO
             4                                    297.0      12.050640                 NaN            NaN            NaN            NaN    CISO
356          5                                     55.0      11.488485                 NaN            NaN            NaN            NaN    CISO
             6                                    122.0      11.484192                 NaN            NaN            NaN            NaN    CISO
377          5                                      NaN            NaN               113.0      10.634884          157.0      10.653220    LDWP
563          14B                                    NaN            NaN                 4.0      15.812850            4.0      16.375000    ISNE
673          S-3                                    NaN            NaN                 6.0      62.849096            6.0      66.123334    FMPP
874          5                                      NaN            NaN                44.0      37.872499           44.0      37.716477     PJM
1554         1                                      NaN            NaN                14.0      12.123729           28.0      13.579799     PJM
             4                                      NaN            NaN                12.0      14.239448           12.0      19.221903     PJM
1571         SMECO                                  NaN            NaN                 5.0      19.709896            5.0      14.359615     PJM
1702         A                                      NaN            NaN               156.0      30.646180          154.0      30.158810    MISO
             B                                      NaN            NaN               343.0      31.068923          342.0      31.068300    MISO
2049         CTA                                    NaN            NaN                 3.0      10.609770            NaN            NaN    SOCO
             CTB                                    NaN            NaN                 3.0      10.620860            NaN            NaN    SOCO
2081         11                                    17.0      13.450980                 8.0      13.671393            NaN            NaN    SWPP
             12                                     7.0      12.000000                 1.0      14.995356            NaN            NaN    SWPP
             13                                     6.0      14.722222                 NaN            NaN            NaN            NaN    SWPP
             14                                    18.0      12.185185                 6.0      12.332316            NaN            NaN    SWPP
             15                                    15.0      13.177778                 4.0      11.454545            NaN            NaN    SWPP
             16                                    11.0      13.787879                 5.0      12.125126            NaN            NaN    SWPP
             17                                    25.0      14.773333                10.0      11.752003            NaN            NaN    SWPP
             18                                    25.0      14.333333                10.0      12.957722            NaN            NaN    SWPP
2499         CT01-1                                 NaN            NaN                 2.0      17.588460            2.0      17.595678    NYIS
             CT01-2                                 NaN            NaN                 1.0      10.157670            1.0      10.151316    NYIS
             CT01-3                                 NaN            NaN                 1.0      10.157670            1.0      10.151316    NYIS
             CT01-8                                 NaN            NaN                 3.0      16.067152            3.0      15.833333    NYIS
             CT02-1                                 NaN            NaN                 1.0      10.579541            1.0      10.586419    NYIS
             CT02-2                                 NaN            NaN                 1.0      12.369623            1.0      12.340909    NYIS
             CT02-3                                 NaN            NaN                 1.0      10.721941            1.0      10.717105    NYIS
             CT02-4                                 NaN            NaN                 1.0      10.457506            1.0      10.465116    NYIS
             CT02-5                                 NaN            NaN                 3.0      15.354239            3.0      15.368216    NYIS
             CT02-6                                 NaN            NaN                 1.0      11.991748            1.0      11.942623    NYIS
             CT02-7                                 NaN            NaN                 1.0      10.579541            1.0      10.586419    NYIS
             CT02-8                                 NaN            NaN                 3.0      16.818250            3.0      16.777778    NYIS
2632         1                                      NaN            NaN                 5.0      10.900407            5.0      10.881579    NYIS
3161         3                                      NaN            NaN                 NaN            NaN            2.0      10.308270     PJM
3403         GCT2                                   2.0      10.500000                 1.0      10.998761            1.0      10.916667     TVA
3576         BW2                                    NaN            NaN               139.0      14.145552          139.0      14.047807    ERCO
             BW3                                    NaN            NaN                88.0      16.065703           90.0      16.132975    ERCO
3809         3                                      NaN            NaN                94.0      22.985355           87.0      14.391267     PJM
3992         9                                      NaN            NaN                 5.0      10.434699            5.0      10.437838    MISO
4266         4                                    507.0      24.520710                41.0      10.580426           43.0      10.614618    ERCO
             5                                    582.0      25.776632               140.0      12.498052          140.0      12.451361    ERCO
6042         PMT1                                   NaN            NaN                 3.0      11.472898            3.0      11.566540     FPL
6074         1                                      8.0      10.100000                 2.0      10.297569            NaN            NaN    SWPP
             2                                      1.0      10.500000                 1.0      12.668601            NaN            NaN    SWPP
             3                                      1.0      10.000000                 1.0      10.001303            NaN            NaN    SWPP
             4                                      1.0      10.833333                 1.0      10.099935            NaN            NaN    SWPP
6085         16A                                    2.0      17.166667                 NaN            NaN            NaN            NaN    MISO
6651         CT01                                   NaN            NaN                 6.0      26.919475            NaN            NaN    MISO
6824         1                                     11.0      17.363636                17.0      25.641679            NaN            NaN    MISO
             2                                     12.0      16.583333                17.0      25.231016            NaN            NaN    MISO
7425         1                                      NaN            NaN                 9.0      17.058380            NaN            NaN    MISO
10350        CTGB                                   NaN            NaN                 7.0      16.728111            7.0      16.894603    CISO
50732        ETBLR1                                 NaN            NaN                33.0      25.215167            NaN            NaN     PJM
             ETBLR2                                 NaN            NaN                24.0      33.637866            NaN            NaN     PJM
             ETBLR3                                 NaN            NaN                25.0      33.581149            NaN            NaN     PJM
55096        BLR2                                   NaN            NaN                 4.0      14.014076            NaN            NaN    MISO
55381        CT-005                                 NaN            NaN               119.0      13.069302          119.0      13.042396     PJM
55419        600                                    NaN            NaN                 NaN            NaN           22.0      28.393516    MISO
             700                                    NaN            NaN                 NaN            NaN           29.0      33.363435    MISO
             800                                    NaN            NaN                 NaN            NaN           28.0      33.987326    MISO