singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
72 stars 5 forks source link

Expand validation of outputs #287

Closed grgmiller closed 1 year ago

grgmiller commented 1 year ago

This PR includes a series of improvements to address many of the small/easy validation coverage tasks. Fixes CAR-1878, CAR-1877, CAR-1876, CAR-1875, CAR-1872, CAR-1867, CAR-1866.

Specifically, this PR does the following:

  1. Expand the coverage of validate_unique_datetimes to check whenever we are loading hourly data inputs, concat-ing hourly dataframes together, and outputting hourly data. (This does not yet check the datetimes in EIA-930 data)
  2. Expand the coverage of validate_shaped_totals. Previously we had been running this check separately from impute_hourly_profiles.shape_monthly_eia_data_as_hourly(). We now embed this check within that function so that it has to be run after shaping.
  3. I checked that ensure_non_overlapping_data_from_all_sources was being run every time that we concat-ed CEMS data, partial CEMS data, and shaped hourly EIA data together.
  4. Expand the coverage of test_for_missing_subplant_id so that it is run every time the subplant_crosswalk dataframe is merged into another dataframe to ensure that all merge keys are associated with a subplant_id.
  5. Expand the column coverage of test_for_missing_values. Previously we had been checking specified named columns for missing values when outputting results. However, in theory there should be no missing values in any columns in final results, so this test now checks every column in the df for missing values.
  6. Expand the column coverage of test_for_negative_values. Previously, we had specified a list of columns that should be positive. We now take the opposite approach, assuming that all numerical columns should only contain positive values, unless a column is explicitly allowed to be negative (in this case, the only column that should be allowed negative is net_generation_mwh)

This PR also applies black formatting to all modified files, which changed a few lines that hadn't been formatted in previous PRs