Open grgmiller opened 1 year ago
Based on the 2020 EIA-923 data, net generation and fuel consumption reported at the A
frequency account for 10% of the total reported generation and fuel.
After exploring this in more detail, here are some summary statistics about the annually reported data:
For 2020, category | fuel consumption | net generation | co2 mass |
---|---|---|---|
annual as % of total reported EIA-923 data | 9.7% | 9.7% | 3.1 % |
annual used in final results as % of total reported EIA-923 | 8.7% | 8.9% | 1.9% |
annual used for partial year as % of total reported EIA-923 | 0.7% | 0.5% | 0.9% |
So (from the perspective of fuel consumption) annually-reported data makes up about 10% of all EIA-923 data. However, because for some months we have CEMS data available, annually-reported EIA data makes up about 9% of our final results (whereas monthly EIA data makes up about 39% and CEMS data makes up about 52%).
This means about 9% of the final data may have lower quality temporal allocation due to the fact that EIA is imputing the month to which the annual data belongs. But on an annual level, there should be no issue with this data.
However, any potential double-counting or under-counting of data would result from instances where we use CEMS data for part of the year, and annually-reported EIA data for part of the year, if the annual data wasn't correctly allocated to each month. However, such data only makes up 0.7% of the final data.
Given that this last category makes up such a small percentage of the data, we will want to add this to the list of data caveats, but I'm not sure if we need to prioritize a fix before the initial public release. However, this should be a data quality metric that we track as part of the pipeline.
For the initial public release, I'm just adding output metrics that describe the data quality in this regard. In the future, we should determine how we want to handle this more thoroughly
We may also want to consider re-imputing the monthly shape of annual data instead of trusting the monthly imputation done by EIA.
Each plant can report data to EIA on three different frequencies:
This means that data for plants that respond at the
A
frequency may not reflect the actual monthly value. This has implications for several parts of our data pipeline:M
orAM
frequencyA
frequency, since the monthly values for the months we are filling might not reflect the actual monthly data.Issue 3 is tricky, since it could lead to potential double-counting or under-counting of data for subplants that only report part of the year to CEMS. One way to approach this would be to calculate what percent of the annual total reported in EIA-923 the partial CEMS data acounts for, and then evenly allocate the difference between the two to each month that is not reported.