singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
72 stars 5 forks source link

Fix missing NOx and SO2 identifiers #256

Closed grgmiller closed 1 year ago

grgmiller commented 1 year ago

This PR is meant to address https://github.com/singularity-energy/open-grid-emissions/issues/255.

To get around the fact that some NOx and SO2 control data doesn't have an associated nox or so2 control id, we now load all four control ids (nox, so2, pm, hg) and use these to map the data to a boiler if the nox or so2 id is missing.

Previously we had dropped all rows from the air emissions control table that had a missing pollutant-specific control id, but now we only drop rows that don't have any pollutant-specific data (ie if we're calculating nox, we drop any rows that don't have any nox control data.

I've set this up as an iterative process where the user specifies which id they want to use as primary, and then the code cycles through each additional id, filling missing boiler associations until all data is associated with boilers (or there are no remaining ids to use). I did it this way instead of loading all control-boiler associations because many control ids are associated with many boilers, so this led to duplicate boiler matches. This is all coordinated by emissions.associate_control_ids_with_boiler_id()

Because we now need to load four different control id to boiler association tables, I replaced the nox- and so2-specific functions to load these tables with a more generic function load_boiler_control_id_association_eia860() in which the user can specify which pollutant table they want to load (eg nox, so2, hg, pm). This is possible because all four of these tables are in the same format.