singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
67 stars 4 forks source link

Specify earliest/latest supported years for Open Grid Emissions data #176

Closed miloknowles closed 1 year ago

miloknowles commented 1 year ago

Issue Summary

I've run into some data availability issues when trying to download and/or export EIA-923 and EIA-860 data from before the last 5 years or so. For example:

Suggested Fix

In general, it seems like improvements based on 2019-2020 data (e.g using boiler information in SO2 calculations) have the potential to break the code for older years of data where there might be missing files, worse data quality, etc. It's fine if older years aren't available, but it would be helpful to know that ahead of time.

Would it be possible to specify what the earliest/latest years supported and tested by Open Grid Emissions are for users like myself who are looking for cleaned historical data? I'm not sure if it makes more sense to do this at the repo level or for specific functions (e.g clean_eia923).

miloknowles commented 1 year ago

@grgmiller on a related note, would it be possible to have a fallback behavior for SO2 calculations when boiler information isn't available?

grgmiller commented 1 year ago

In general, these issues stem from the fact that the data urls, file formats, column names, etc change over time. This is one of the reasons why we generally rely on pudl for these data, because their ETL pipeline already accounts for these changes and standardizes all of the outputs. The use of the raw EIA files is intended to be temporary until the specific tables that we need from these forms are integrated into pudl (see https://github.com/singularity-energy/open-grid-emissions/issues/154).

Thus, I think would rather spend our efforts on the pudl integration rather than trying to handle these issues with the raw files on our own. Maybe this means that we should prioritize #154 a bit higher in our queue (for v2 release).

The specific issue pre-2008 is because form 923 used to be called form 906/920, so the download link is https://www.eia.gov/electricity/data/eia923/archive/xls/f906920_2006.zip

Suggested fixes

In general, for cleaned historical EIA data, I would suggest getting the data directly from pudl, if the data you're looking for is integrated into their pipeline (the environmental control data is obviously not currently available).

I like your suggestion of adding information about the available years. I'm thinking we could add this in at least two places:

Currently, the pipeline is only designed to work with 2019 and 2020, and we haven't tested it for other years.

grgmiller commented 1 year ago

@grgmiller on a related note, would it be possible to have a fallback behavior for SO2 calculations when boiler information isn't available?

Is the issue that the pipeline is unable to map a boiler_id to a unit, or that an SO2 emission factor is not available for a specific fuel/firing type/prime mover combination? If it is the latter, the intent is that we could use the warning message to identify and add emission factors manually as needed, with the hope that over time we identify all possible combinations. If this is the latter, if you can share the error message of the boiler types that are missing factors, I can work on adding these.

miloknowles commented 1 year ago

@grgmiller on a related note, would it be possible to have a fallback behavior for SO2 calculations when boiler information isn't available?

Is the issue that the pipeline is unable to map a boiler_id to a unit, or that an SO2 emission factor is not available for a specific fuel/firing type/prime mover combination? If it is the latter, the intent is that we could use the warning message to identify and add emission factors manually as needed, with the hope that over time we identify all possible combinations. If this is the latter, if you can share the error message of the boiler types that are missing factors, I can work on adding these.

Sorry,I should have been more specific!

Before 2013, the 6_2_EnviroEquip_Y[YEAR].xlsx is missing, which causes calculate_so2_from_fuel_consumption -> calculate_generator_so2_ef_per_unit_from_boiler_type -> load_boiler_firing_type(year) -> load_boiler_design_parameters_eia860 to fail. As a result, SO2 emissions can't be calculated using firing type information, although the fuel type (and maybe prime mover?) should still be available.

I was thinking that we could use some kind of average emission factor for each fuel type + prime mover combination as the fallback for pre-2013 years? That might have been what you were doing before implementing a better method that applies firing type information.

miloknowles commented 1 year ago

Also, I agree that it's not worth worrying about the raw data as much. If someone else besides me runs into the EIA-923 download issue (which breaks the data pipeline) then maybe we can host those zip files somewhere ourselves. Otherwise we can close this issue.

grgmiller commented 1 year ago

I'm closing this issue, but creating a new issue about SO2