singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
67 stars 4 forks source link

Add BA code to warnings #348

Closed rouille closed 4 months ago

rouille commented 4 months ago

Purpose

Add BA code to printouts.

What the code is doing

Whenever a plant-level data frame is printed out when running the pipeline, the ba code associated to each plant is retrieved and inserted in the printed data frame. Existing functions are used to create a helper function that return a dictionary mapping the EIA plant id to the BA code.

Testing

Running the pipeline

Where to look

Usage Example/Visuals

    plant_id_eia  gross_generation_mwh  net_generation_mwh  annual_plant_ratio ba_code
39         54096               18039.0            162310.0            8.997727    SOCO
27         10567               17734.0            134118.2            7.562772    ISNE
10          2831                1913.0              7158.4            3.741976     PJM
67         55470             1200399.0           3786786.0            3.154606    ERCO
79         65372                7427.0             17138.7            2.307621    ERCO
68         55672              164399.0            285386.4            1.735938    SOCO
58         55211             1558631.0           2561209.0            1.643243    ISNE
9           2712             3346069.0           5460053.1            1.631781    CPLE
25          8059              177444.0            284855.0            1.605323    SWPP
37         52168               57475.0             89516.3            1.557482    NYIS
73         56565             2008929.0           3121630.0            1.553878    SWPP
3            689              878403.0           1354941.0            1.542505     TAL
48         55068              434128.0            669598.0            1.542398    ISNE
65         55406             1651462.0           2544639.0            1.540840    SOCO
14          7302             7956453.0          12240582.0            1.538447     FPC
2            676             1522950.0           2342083.8            1.537860    FMPP
78         62548               22691.0             34892.2            1.537711    ERCO
64         55372             1740686.0           2674207.0            1.536295    HGMA
52         55089             3710150.0           5698863.0            1.536020    MISO
66         55457             1497373.0           2298880.0            1.535275    SWPP
54         55149             3728636.0           5722994.0            1.534876    ISNE
13          4937             2051426.0           3143980.0            1.532583    ERCO
71         55976             3869629.0           5914124.0            1.528344     PJM
80         65978             2546859.0           3880981.9            1.523831     FPL
43         54537              644497.0            981634.0            1.523101    PSEI
55         55154             2066741.0           3141227.0            1.519894    ERCO
74         56806             1615115.0           2453095.0            1.518836    ERCO
17          7699              838072.0           1266028.0            1.510643     FPC
22          7873             5102750.0           7698823.8            1.508760     TEC
5           1393             2180774.0           3274613.0            1.501583    MISO
51         55088             2964286.0           4447878.7            1.500489    MISO
24          7985              984832.0           1475295.0            1.498017    MISO
57         55207                9007.0             13467.4            1.495215    PSCO
49         55079              771685.0           1152563.0            1.493567    ISNE
61         55239             2107041.0           3137651.0            1.489127     PJM
44         54592                 368.0               547.0            1.486413    NYIS
63         55337             3456386.0           5136582.0            1.486114     PJM
30         10776                2722.0              4027.0            1.479427    CISO
16          7380             2542785.0           3757745.0            1.477807     SEC
56         55170             2324179.0           3433772.0            1.477413    ISNE
8           2539             2887468.0           4262510.0            1.476210    NYIS
70         55710             2988397.0           4405161.0            1.474088     PJM
23          7946             1591560.0           2333887.0            1.466415    SOCO
20          7846             3324659.0           4871811.0            1.465357     JEA
77         57978              575491.0            840306.0            1.460155    CISO
59         55212             1781983.0           2598161.0            1.458017    ISNE
45         54640              260353.0            379559.0            1.457863     PJM
15          7314              459746.0            668301.0            1.453631    NYIS
0            533             2681877.0           3895666.0            1.452589    SOCO
47         55043              289727.0            418180.6            1.443361     DUK
36         51030              530813.0            765028.0            1.441238    ISNE
60         55231             2477537.0           3556261.0            1.435402     PJM
21          7870              297757.0            426767.0            1.433273    PSEI
62         55295             1103518.0           1579922.0            1.431714    CISO
38         54041              135935.0            194505.3            1.430870    NYIS
11          3236              693275.0            991141.0            1.429651    ISNE
42         54476              321713.0            458958.0            1.426607    PSEI
41         54324              496544.0            708046.0            1.425948    ISNE
12          3295             1412631.0           2007514.1            1.421117    SCEG
6           2398             1464621.0           2065411.0            1.410202     PJM
4           1007             1105621.0           1551750.0            1.403510    MISO
75         57564              127570.0            176967.0            1.387215    CISO
18          7826             3235073.0           4474654.1            1.383169     DUK
69         55706             2495062.0           3425619.0            1.372959     TVA
1            607               62435.0             85612.2            1.371221    PSEI
53         55096               93827.0            128152.4            1.365837    MISO
29         10726              171698.0            233904.0            1.362299    ISNE
50         55086              516135.0            699287.0            1.354853    ERCO
40         54131               41076.0             55348.0            1.347454    NYIS
31         50002               42188.0             56796.0            1.346260    ISNE
26         10307              159406.0            211088.0            1.324216    ISNE
72         56151              164123.0            216594.0            1.319705    SWPP
7           2493             2300590.0           3011081.0            1.308830    NYIS
33         50744                2926.0              3778.0            1.291183    NYIS
19          7834             4413919.0           5688275.2            1.288713      SC
32         50555                1281.0              1643.0            1.282592     PJM
34         50949               10974.0             14069.6            1.282085     TEC
46         54785              909303.0           1156415.1            1.271760     PJM
35         50978              120494.0            152666.0            1.267001    NYIS
28         10633              536945.0            677731.0            1.262198     PJM
76         57703              225755.0            282814.0            1.252747    WACM

Review estimate

20min

Future work

Flag anomalous values in CEMS input timeseries.

Checklist

rouille commented 4 months ago

At a high level, I'm wondering if the proposed solution here is a bit over-engineered and adding unnecessary complexity. Maybe I don't fully understand what problem the memory caching is solving, but it seems like the addition of a ba code to these warning messages could have been addressed with a simpler solution like we discussed like: df = df.merge(create_plant_ba_table(year)[["plant_id_eia,"ba_code"]], how="left", on="plant_id_eia", validate="m:1"). If we want to avoid re-loading this dataframe multiple times, one other potential solution could have been to load it once at the top of validation.py and have it accessed as a global variable in each validation function that uses it.

This is all to say - if there is a compelling reason for caching in memory, let's do that, but that reason is not currently clear to me, and I'm hesitant to add more complexity to OGE if not needed.

I appreciate all the work to address the warning messages we've been seeing in pandas, as well as splitting out data_cleaning into smaller chunks by adding helpers (although see my comment about ensuring we are consistent about what functions end up where).

A couple other high-level comments:

* If we end up restructuring the code and/or adding new modules, we'll need to make sure that the readme is updated

* Whenever we are adding ba_codes to a dataframe, we should make sure that we are only ever adding this to "throwaway" dfs that are not passed further in the data pipeline; otherwise, we should be sure to drop the ba_code column from the df after the warning message.

I will implement your solution, remove the oge.utility module and update the README to list the new helper module.

rouille commented 4 months ago

Here is the logfile. It has been compared against the one from the release and it looks good. data_pipeline.log