singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
67 stars 4 forks source link

Handling missing data in EIA-930 #113

Open grgmiller opened 2 years ago

grgmiller commented 2 years ago

How is missing data handled in the EIA-930 data cleaning/reconciliation process?

Whenever there is missing data in EIA-930, it appears that a value of 1.0 is getting assigned to those hours. We may want to preserve NA values instead, but I'm not sure if this would mess up the physics reconciliation process? Could 0 be used instead of 1.0?

grgmiller commented 2 years ago

Examining the EIA-930 data after it has gone through the reconciliation process, of the ~5.2 million rows of data in eia930_data , approximately 3.2 million rows are equal to 1.0 +/- 0.001. There is also no data equal to zero in the entire dataframe. It seems like the eia930 data cleaning process is treating reported zeros, negative values, and missing data the same, and assigning them all a value of 1.

I'm not sure whether this data cleaning step is a requirement of the physics based optimization (can the optimization not handle negative or zero values?), or whether this could be changed.

Ideally, we'd like to preserve negative, missing, and zero values in their original form. Where there are missing values, or anomalous values, it might make sense to implement some sort of imputation step, but it seems like there should be some better method than just assigning it a value of 1.

gailin-p commented 2 years ago

Most of the zeros (~5,000,000) are added in the physics-based cleaning step, mostly in fuel-specific generation columns not present in the original EIA-930 data (eg, NUC generation for a small BA with no nuclear power plant).

About 2000 1.0 values are added in each of the basic and rolling cleaning steps.

Addressing this issue should include:

grgmiller commented 2 years ago

As a next step, should we post this as an issue/question on the gridemissions repo?

gailin-p commented 2 years ago

https://github.com/jdechalendar/gridemissions/issues/8

grgmiller commented 1 year ago

It seems that based on this comment, this method was implemented in gridemissions out of convenience rather than necessity, so it should be possible to update that repo as part of v2.