singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
72 stars 5 forks source link

Update OGE for to work with PUDL v.2022.11.30 and integrate 2021 data #259

Closed grgmiller closed 1 year ago

grgmiller commented 1 year ago

This PR closes https://github.com/singularity-energy/open-grid-emissions/issues/258 by updating OGE to work with PUDL v2022.11.30, and integrating 2021 data.

NOTE: We should review and merge https://github.com/singularity-energy/open-grid-emissions/pull/246 into this branch before merging this branch into development

grgmiller commented 1 year ago

Remaining steps to do:

grgmiller commented 1 year ago

When running the pipeline with the newly-downloaded EIA-930 data, it seems that there may have been some retroactive revisions to the 2020 data, because now when outputting consumed emission factors, we are getting an error message about there being negative emission factors when exporting monthly consumed factors for SEC. The root cause of this issue is described by https://github.com/singularity-energy/open-grid-emissions/issues/214, and was patched by https://github.com/singularity-energy/open-grid-emissions/pull/221 (in which we created a list of BAs with this issue and told the pipeline to use the reported demand values from EIA-930 instead of calculating net demand from generation and interchange). SEC was not previously included in this list, but now seems to be exhibiting these same symptoms.

The larger fix to this is captured by https://github.com/singularity-energy/open-grid-emissions/issues/220, but in the meantime, there are a couple ways we can go about patching this.

  1. Investigate whether there is a new data quality issue in the EIA-930 timeseries for SEC that we need to correct in eia930.manual_930_adjust()
  2. Add SEC to the BA_930_INCONSISTENCY list in src.consumed(). However, since this issue seems to be confined to 2020 data for SEC, I'd like to see us update BA_930_INCONSISTENCY so that we are only applying this patch for years where it is necessary, rather than applying it to these BAs in all years. One way to do this would be to make BA_930_INCONSISTENCY a dictionary, where the key is the year and the value is a list of the BAs that need this patch in that year.
grgmiller commented 1 year ago

So we are currently running into an issue where the consumed emission calculation in step 18 is returning missing values for all hours in all regions through April of 2021 (for the 2021 year run). I think I've traced the source of this issue to
consumed.HourlyConsumed.run(). Specifically, when we get to the following section of code:

# Run
  try:
      consumed_emissions, _ = consumption_emissions(E, G, ID)
  except np.linalg.LinAlgError:
      # These issues happen at boundary hours (beginning and end of year)
      # where we don't have full data for all BAs
      # print(f"WARNING: singular matrix on {date}")
      consumed_emissions = np.full(len(self.regions), np.nan)

for the first 2,900 datetimes in the for loop (corresponding with first four months), the consumption_emissions() function is returning the Singular Matrix linalg error that triggers the except clause. Here is the full traceback:

File a:\GitHub\open-grid-emissions\notebooks\work_in_progress\../../../open-grid-emissions/src\consumed.py:179, in consumption_emissions(F, P, ID)
    176             # force this to be zero so the linear system makes sense
    177             b[i] = 0.0
--> 179 X = np.linalg.solve(A, b)
    181 for j in perturbed:
    182     if X[j] != 0.0:

File <__array_function__ internals>:180, in solve(*args, **kwargs)

File a:\miniconda3\envs\open_grid_emissions\lib\site-packages\numpy\linalg\linalg.py:400, in solve(a, b)
    398 signature = 'DD->D' if isComplexType(t) else 'dd->d'
    399 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 400 r = gufunc(a, b, signature=signature, extobj=extobj)
    402 return wrap(r.astype(result_t, copy=False))

File a:\miniconda3\envs\open_grid_emissions\lib\site-packages\numpy\linalg\linalg.py:89, in _raise_linalgerror_singular(err, flag)
     88 def _raise_linalgerror_singular(err, flag):
---> 89     raise LinAlgError("Singular matrix")

LinAlgError: Singular matrix

The comment on this code suggests that this error would only be raised when we don't have full data for all of the BAs. However, I'm not sure why there would not be full data now compared to the current release of OGE. It seems like the source of this new issue either has to result from 1) a change to the raw downloaded data from EIA or 2) a change to the way we are processing the data. However, I'm not noticing any changes in the code that would have changed this, so I'm kind of stumped about the cause.

To further trace the source, I actually tried running the consumed emission calculation using the main branch, but I am still getting the same singular matrix error in the same places. This suggests that the issue is either with the source EIA-930 data or with our cleaning of it.

Another thing we may want to investigate: Is this maybe a result of our manual timestamp cleaning? Gailin I know that you looked into this already, but I'm wondering if something was corrected in the 930 balance files that we aren't catching, and that's leading to issues with several months of the data?

There are a few things related to the cleaning of the 930 data that I'm noticing that I had questions about (may or may not be related to the above issue):

@gailin-p

grgmiller commented 1 year ago

One other thing I'm noticing about this issue: in 2021, it is affecting all data from 1/1/2021 - 4/30/2021. In 2020, it is only affecting data for the month of March. It's strange that the impact is so neatly cutoff by month, which makes me wonder if there is a clue there - is there some step that we are doing that affects data on a month by month basis?

gailin-p commented 1 year ago

The above issue is called by inconsistent transmission interchange and generation in HGMA in 2021 Jan - April. For BAs where there is no generation and no export, the consumption_emissions function (from gridemissions) perturbs the matrix to make it invertible, however, it doesn't do the same when a BA has zero generation but non-zero export (a situation that's physically impossible and therefore guaranteed not to happen in the gridemissions pipeline, where consumed emissions calculations come after physics-based data cleaning).

OGE generation for HGMA is zero until May 2021:

Screen Shot 2022-12-23 at 11 05 50 AM

But in both raw and post-gridemissions 930 data, there is non-zero import/export from HGMA in those months:

Screen Shot 2022-12-23 at 11 09 20 AM Screen Shot 2022-12-23 at 11 09 35 AM

Crucially, the raw data shows net import to HGMA, which is physically possible with zero generation, but after physics-based cleaning, the 930 data shows net export from HGMA, which is not possible with zero generation.

There are three easy fixes here, two of which are general (but might also let problem data sneak through in the future), one of which is specific:

gailin-p commented 1 year ago

As a side-note, because the interchange post physics-based cleaning is >1, it won't be set to zero in our filter for imputed ones. Also, the interchange is 1 in the balance files direct from EIA, so it's not noise introduced by gridemissions.

gailin-p commented 1 year ago

I notice that the EIA-930 data cleaning for 2021 (step 12 of the data pipeline) is cleaning data all the way back to July 2020. I know that we need some data pre 1/1/2021, but I'm assuming we don't need to go back that far. Would one day (12/31/2020) be sufficient? Further filtering this should in theory at least speed up the data cleaning step by 33%.

The prior six months is needed for the rolling filter used by gridemissions

It looks like in step 18, we are loading the cleaned file (eia930_elec.csv), but not necessarily removing the imputed 1 values or filtering to only 2021 values (unless I am missing this).

No need to filter to 2021 values, since we only run the computation on dates in the OGE generation, which is limited to year. We removed imputed ones when we use the 930 data to calculate residual profiles (eia930.remove_imputed_ones), but not for the consumed emissions calculation; I think I just overlooked this since the imputed ones were causing major issues with the residual profiles but weren't an issue with the consumed calc. I can add it for consistency.

gailin-p commented 1 year ago
> # In some cases, we have zero generation but non-zero transmission
> # usually due to imputed zeros during physics-based cleaning being set to 1.0
> # but sometimes due to ok values being set to 1.0ß
> to_fix = (ID.sum(axis=1) > 0) & (G == 0)
> ID[:, to_fix] = 0
> ID[to_fix, :] = 0

Actually, rewriting this check to make it more general will fix the HGMA bug and seems like the lowest impact option; I'll do that

grgmiller commented 1 year ago

This PR should close #163