A repository of scripts used for converting emissions to concentrations and health impacts using the ISRM for California.
Libby Koolik, UC Berkeley
Last modified July 11, 2023
Note: this version of the code is archival. The model has been renamed to ECHO-AIR and moved to a new home. For more information, please visit https://echo-air-model.github.io.
The Intervention Model for Air Pollution (InMAP) is a powerful first step towards lowering key technical barriers by making simplifying assumptions that allow for streamlined predictions of PM2.5 concentrations resulting from emissions-related policies or interventions.[*] InMAP performance has been validated against observational data and WRF-Chem, and has been used to perform source attribution and exposure disparity analyses.[*, *, *] The InMAP Source-Receptor Matrix (ISRM) was developed by running the full InMAP model tens of thousands of times to understand how a unit perturbation of emissions from each grid cell affects concentrations across the grid. However, both InMAP and the ISRM require considerable computational and math proficiency to run and an understanding of various atmospheric science principles to interpret. Furthermore, estimating health impacts requires additional knowledge and calculations beyond InMAP. Thus, a need arises for a standalone and user-friendly process for comparing air quality health disparities associated with various climate change policy scenarios.
The ultimate goal of this repository is to create a pipeline for estimating disparities in health impacts associated with incremental changes in emissions. Annual average PM2.5 concentrations are estimated using the InMAP Source Receptor Matrix for California.
The ISRM Health Calculation model works by a series of two modules. First, the model estimates annual average change in PM2.5 concentrations as part of the Concentration Module. Second, the excess mortality resulting from the concentration change is calculated in the Health Module.
The InMAP Source Receptor Matrix (ISRM) links emissions sources to changes in receptor concentrations. There is a matrix layer for each of the five precursor species: primary PM2.5, ammonia (NH3), oxides of nitrogen (NOx), oxides of sulfur (SOx), and volatile organic compounds (VOC). By default, the tool uses the California ISRM. For each of these species in the California ISRM, the ISRM matrix dimensions are: 3 elevations by 21,705 sources by 21,705 receptors. The three elevations of release height within the ISRM are:
The tool is capable of reading in a different ISRM, if specified by the user.
The units of each cell within the ISRM are micrograms per meter cubed per microgram per second, or concentration per emissions.
The concentration module has the following steps. Details about the code handling each step are described in the Code Details(*) section below.
For each layer triggered in the preprocessing step:
Once all layers are done:
The ISRM Tool calculations health module follows US EPA BenMAP CE methodology and CARB guidance.
Currently, the tool is only built out to use the Krewski et al. (2009), endpoint parameters and functions.(*) The Krewski function is as follows:
$$ \Delta M = 1 - ( \frac{1}{\exp(\beta{d} \times C{i})} ) \times I{i,d,g} \times P{i,g} $$
where $\beta$ is the endpoint parameter from Krewski et al. (2009), $d$ is the disease endpoint, $C$ is the concentration of PM2.5, $i$ is the grid cell, $I$ is the baseline incidence, $g$ is the group, and $P$ is the population estimate. The tool takes the following steps to estimate these concentrations.
Preprocessing: the tool will merge the population and incidence data based on geographic intersections using the health_data.py
object type.
Estimation by Endpoint: the tool will then calculate excess mortality by endpoint:
Once all endpoints are done:
The ISRM Tool has a command called check-setup
that allows the user to make sure that all of the code and data files are properly saved and named in order to make sure that the program will run.
Below is a brief table of contents for the Code Details section of the Readme.
The code is written in Python 3. The library requirements are included in this repository as requirements.txt
. For completeness, they are reproduced here:
Python libraries can be installed by running pip install -r requirements.txt
on a Linux/Mac command line.
isrm_calcs.py
The isrm_calcs.py
script is the main script file that drives the tool. This script operates the command line functionality, defines the health impact calculation objects, calls each of the supporting functions, and outputs the desired files. The isrm_calcs.py
script is not split into functions or objects, instead, it is run through two sections: (1) Initialization and (2) Run Program.
In the initialization section of isrm_calcs.py
, the parser object is created in order to interface with the command line. The parser object is created using the argparse
library.
Currently, the only arguments accepted by the parser object are -i
for input file, -h
for help, and --check-setup
to run a setup check.
Once the parser is defined, the control file object is created using control_file.py
class object. A number of metadata variables are defined from the control file.
Next, a number of internally saved data file paths are saved.
Finally, the output_region is defined based on the get_output_region
function defined in tool_utils.py
. The output region is then stored for use in later functions.
The run program section of the code is split into two modes. If the CHECK_INPUTS flag is given, the tool will run in check mode, where it will check that each of the inputs is valid and then quit. If the CHECK_INPUTS flag is not given, the tool will run the full program.
It will start by creating a log file using the setup_logging
function. Once the logging is set up, an output directory is created using the create_output_dir
function from tool_utils.py
. It will also create a shapefile subdirectory within the output folder directory using create_shape_out
. The tool will also create an output_region
geodataframe from user inputs for use in future steps.
Then, the tool will begin the concentration module. This starts by defining an emissions object and an isrm object using the emissions.py
and isrm.py
supporting class objects. The concentrations will be estimated using the concentration.py
object, which relies on the concentration_layer.py
object. The concentrations will then be output as a map of total exposure concentration and a shapefile with detailed exposure information.
Next, the tool will run environmental justice exposure calculations using the create_exposure_df
, get_overall_disparity
, and estimate_exposure_percentile
functions from the environmental_justice_calcs.py
file. The exposure percentiles will then be plotted and exported using the plot_percentile_exposure
function. If the control file has indicated that exposure data should be output (using the 'OUTPUT_EXPOSURE' flag), a shapefile of exposure concentrations by population group will be output in the output directory.
Finally, if indicated by the user, the tool will begin the health module. It will create the health input object using the health_data.py
library and then estimate the three endpoints of excess mortality using calculate_excess_mortality
from the health_impact_calcs
file. Each endpoint will then be mapped and exported using visualize_and_export_hia
.
The tool utilizes parallel computing to increase efficiency and reduce runtime. As such, many of these steps do not happen exactly in the order presented above.
The program has completed when a box stating "Success! Run complete." shows on the screen.
If enabled in the control file, the program will run in check
mode, which will run a number of checks built into the emissions
, isrm
, and population
objects. Once it runs all checking functions, it will quit and inform the user of the result.
To streamline calculations and increase functionality of the code, python classes
were created. These class definitions are saved in the supporting
folder of the repository. The following sections outline how each of these classes work.
concentration_layer.py
The concentration_layer
object runs ISRM-based calculations using a single vertical layer of the ISRM grid. The object inputs an emissions object (from emissions.py
), the ISRM object (from isrm.py
), and the layer number corresponding to the vertical layer of the ISRM grid. The object then estimates concentrations at ground-level resulting from emissions at that vertical layer release range.
Inputs
emis_obj
: the emissions object, as defined by emissions.py
isrm_obj
: the ISRM object, as defined by isrm.py
layer
: the layer number (0, 1, or 2)Attributes
isrm_id
: a Series of all ISRM grid cell IDsreceptor_id
: a Series of all receptor IDsisrm_geom
: the geometry (geographic attributes) of the ISRM gridcrs
: the coordinate reference system associated with the ISRM gridname
: a string representing the run name preferred by the usercheck
: a Boolean indicating whether the program should run, or if it should just check the inputs (useful for debugging)verbose
: a Boolean indicating whether the user wants to run in verbose modeCalculated Attributes
PM25e
, NH3e
, VOCe
, NOXe
, SOXe
: geodataframes of the emissions (for each pollutant) from that layer re-allocated onto the ISRM gridpPM25
, pNH4
, pVOC
, pNO3
, pSO4
: geodataframes of the concentrations from each primary pollutant from the emissions of that pollutant in that layer
detailed_conc
: geodataframe containing columns for each primary pollutant's contribution to the total ground-level PM2.5 concentrationsSimple Functions
allocate_emissions
: inputs the emissions layer and the ISRM geography, and re-allocates the emissions to the ISRM geography using an area-based allocation procedurecut_emissions
: inputs the pollutant geodataframe from the emissions object and slices it based on the minimum and maximum release heights (minimum inclusive, maximum exclusive) associated with the ISRM vertical layerprocess_emissions
: for each of the five primary pollutants, runs cut_emissions
and then allocate_emissions
to return the geodataframes of emissions of each primary pollutant released in the layer
allocated to the ISRM gridget_concentration
: for a pollutant's emission layer (POLe
), the ISRM matrix for that pollutant, and the layer
ID, estimates the concentration at ground-level for the primary pollutant (pPOL
)combine_concentrations
: merges together all five of the primary pollutant concentration geodataframes (pPOL
) and adds them together to get total ground-level concentrations resulting from emissions released in that layer
concentration.py
The concentration
object runs ISRM-based calculations for each of the vertical layer's of the ISRM grid by processing individual concentration_layer
objects. The object inputs an emissions object (from emissions.py
) and the ISRM object (from isrm.py
). The object then estimates total concentrations at ground-level resulting from emissions.
Inputs
emis_obj
: the emissions object, as defined by emissions.py
isrm_obj
: the ISRM object, as defined by isrm.py
detailed_conc_flag
: a Boolean indicating whether concentrations should be output at a detailed level or notAttributes
isrm_id
: a Series of all ISRM grid cell IDsisrm_geom
: the geometry (geographic attributes) of the ISRM gridcrs
: the coordinate reference system associated with the ISRM gridname
: a string representing the run name preferred by the userrun_calcs
: a Boolean indicating whether the program should run, or if it should just check the inputs (useful for debugging)verbose
: a Boolean indicating whether the user wants to run in verbose modeCalculated Attributes
detailed_conc
: geodataframe of the detailed concentrations at ground-level combined from all three vertical layersdetailed_conc_clean
: simplified geodataframe of the detailed concentrations at ground-level combined from all three vertical layerstotal_conc
: geodataframe with total ground-level PM2.5 concentrations across the ISRM gridInternal Functions
run_layer
: estimates concentrations for a single layer by creating a concentration_layer
object for that layercombine_concentrations
: checks for each of the layer flags in the emissions
object, and then calls the run_layer
function for each layer that is flagged. Then, combines the concentrations from each layer flagged into the three concentration geodataframes described aboveExternal Functions
visualize_concentrations
: draws a map of concentrations for a variable (var
) and exports it as a PNG into an output directory (output_dir
) of choiceexport_concentrations
: exports concentrations as a shapefile into an output directory (output_dir
) of choicecontrol_file.py
The control_file
object is used to check and read the control file for a run:
Inputs
file_path
: the file path of the control fileAttributes
valid_file
: a Boolean indicating whether or not the control file path is validkeywords
: a hardcoded list of the keywords that should be present in the control fileblanks_okay
: a hardcoded list of whether each keyword can be blank (based on order of keywords
)valid_structure
, no_incorrect_blanks
: Boolean keywords based on internal checks of the control file formatrun_name
: a string representing the run name preferred by the useremissions_path
: a string representing the path to the emissions input fileemissions_units
: a string representing the units of the emissions dataisrm_path
: a string representing the path of the folder storing ISRM numpy layers and geodatapopulation_path
: a string representing the path to the population input filecheck
: a Boolean indicating whether the program should run, or if it should just check the inputs (useful for debugging)population_path
: a string representing the path to the population data fileverbose
: a Boolean indicating whether the user wants to run in verbose modeoutput_exposure
: a Boolean indicating whether exposure should be outputdetailed_conc
: a Boolean indicating whether concentrations should should be output as totals or by pollutantInternal Functions
check_path
: checks if a file exists at the given control file pathget_input_value
: gets the input for a given keywordcheck_control_file
: runs all of the internal checks to confirm the control file is validget_all_inputs
: imports all values from the control fileget_region_dict
: loads all of the acceptable values for the various regionsregion_check_helper
: a helper function for checking the region of interest and region category inputscheck_inputs
: checks that all inputs are valid once importedExternal Functions
get_file_path
: returns the file pathemissions.py
The emissions
object is primarily built off of geopandas
. It has the following attributes:
Inputs
file_path
: the file path of the raw emissions dataoutput_dir
: a filepath string for the output directoryf_out
: a string containing the filename pattern to be used in output filesunits
: units associated with the emissions (e.g., μg/s)name
: a plain English name tied to the emissions data, either provided or automatically generated from the filepathdetails_to_keep
: any additional details to be preserved throughout the processing (e.g., sector, fuel type) (not fully built out yet)filter_dict
: filters the emissions inputs based on inputted dictionary (not fully built out yet)load_file
: a Boolean indicating whether or not the file should be loaded (for debugging)verbose
: a Boolean indicating whether or not detailed logging statements should be printedAttributes
valid_file
: a Boolean indicating whether or not the file provided is validvalid_units
: a Boolean indicating whether or not emissions units are compatible with the programvalid_emissions
: a Boolean indicating whether or not emissions passed required testsfile_type
: the type of file being used to provide raw emissions data (for now, only .shp
is allowed)geometry
: geospatial information associated with the emissions inputcrs
: the inherent coordinate reference system associated with the emissions inputemissions_data
: complete, detailed emissions data from the sourceemissions_data_clean
: simplified emissions in each grid cellCalculated Attributes
PM25
: primary PM2.5 emissions in each grid cellNH3
: ammonia emissions in each grid cellVOC
: VOC compound emissions in each grid cellNOX
: NOx emissions in each grid cellSOX
: SOx emissions in each grid cellL0_flag
, L1_flag
, L2_flag
, linear_interp_flag
: Booleans indicating whether each layer should be calculated based on emissions release heightsInternal Functions
get_file_path
: returns the file pathget_name
: returns the name associated with the emissions (emissions_name
)get_unit_conversions
: returns two dictionaries of built-in unit conversionscheck_path
: uses the path
library to check if the provided file_path
exists and if the file is a filecheck_units
: checks that the provided units are valid against the get_unit_conversions
dictionariesload_emissions
: detects the filetype of the emissions file and calls the appropriate load functionload_shp
: loads the emissions data from a shapefileload_feather
: loads the emissions data from a feather fileload_csv
: loads the emissions data from a csv filecheck_height
: checks that the height column is present in the emissions file; if not, assumes emissions are released at ground-levelcheck_emissions
: runs a number of checks on the emissions data to ensure data are valid before running anythingmap_pollutant_names
: replaces pollutant names if they are not found in the emissions data based on near-misses (e.g., PM2.5 for PM25)filter_emissions
: filters the emissions based on the filter_dict
inputcheck_geo_types
: checks what geometries are present in the emissions shapefile (e.g., points, polygons, multipolygons); if points exist, uses buffer_emis
to convert to polygonsbuffer_emis
converts points to polygons by adding a buffer of dist
clean_up
: simplifies the emissions data by removing unnecessary dimensions, converting units as appropriate, and updating the column namesconvert_units
: converts units from provided units to μg/s using the unit dictionaries built-insplit_polutants
: converts the emissions layer into separate objects for each pollutantwhich_layers
: determines the L0_flag
, L1_flag
, L2_flag
, and linear_interp_flag
variables based on the HEIGHT column of the emissions dataExternal Functions
visualize_emissions
: creates a simple map of emissions for a provided pollutantget_pollutant_layer
: pulls a single pollutant layer based on pol_name
health_data.py
The health_data
object stores and manipulates built-in health data (population and incidence rates) from BenMAP. It inputs a dictionary of filepaths and two Boolean run options (verbose
and race_stratified
) to return dataframes of population, incidence, and combined population-incidence information (pop_inc
).
Inputs
pop_alloc
: a geodataframe of population allocated to the ISRM grid geometryincidence_fp
: a string containing the file path to the background incidence datasetverbose
: a Boolean indicating whether or not detailed logging statements should be printedrace_stratified
: a Boolean indicating whether race-stratified incidence rates should be usedCalculated Attributes
population
: a geodataframe containing the population allocated to the ISRM grid geometryincidence
: a geodataframe containing the raw incidence data from BenMAPpop_inc
: a geodataframe containing the combined population and incidence data based on the requested geographiesInternal Functions
load_data
: reads in the population and incidence data from feather filesupdate_pop
: updates the population dataset by melting (unpivot) and renaming columnsupdate_inc
: updates the incidence dataset by pivoting columns around endpoints and renaming columnsget_incidence_lookup
: creates a small incidence lookup table based on the name and age rangesget_incidence_pop
: helper function that returns the incidence for a given name, race, age range, and endpointmake_incidence_lookup
: creates a lookup dictionary using the get_incidence_pop
function for each endpointincidence_by_age
: creates a smaller incidence table for merging by calling get_incidence_lookup
for each endpointcombine_pop_inc
: creates the pop_inc
dataframe by doing a spatial merge on the population and incidence data and then using lookup tables to determine the appropriate valuesisrm.py
The isrm
object loads, stores, and manipulates the ISRM grid data.
Inputs
isrm_path
: a string representing the folder containing all ISRM dataoutput_region
: a geodataframe of the region for results to be output, as calculated by get_output_region
in tool_utils.py
region_of_interest
: the name of the region contained in the output_region
load_file
: a Boolean indicating whether or not the file should be loaded (for debugging)verbose
: a Boolean indicating whether or not detailed logging statements should be printedAttributes
nh3_path
, nox_path
, pm25_path
, sox_path
, voc_path
: the filepath strings for each of the primary pollutant ISRM variablesvalid_file
: a Boolean indicating whether or not the file provided is validvalid_geo_file
: a Boolean indicating whether the ISRM geometry file provided is validgeodata
: a geodataframe containing the ISRM feather file informationcrs
: the inherent coordinate reference system associated with the ISRM geometrygeometry
: geospatial information associated with the ISRM geometryCalculated Attributes
receptor_IDs
: the IDs associated with ISRM receptors within the output_region
receptor_geometry
: the geospatial information associated with the ISRM receptors within the output_region
PM25
, NH3
, NOx
, SOX
, VOC
: the ISRM matrices for each of the primary pollutantsInternal Functions
get_isrm_files
: appends the file names to the isrm_path input to generate full file pathscheck_path
: checks if the files exist at the paths specified (both data and geo files)load_and_cut
: loads the numpy layers for a pollutant and trims the columns of each vertical layer's matrix to only include the receptor_IDs
within the output_region
load_isrm
: calls the load_and_cut
function for each ISRM numeric layer and returns a list of pollutant matricesload_geodata
: loads the feather file into a geopandas dataframeclip_isrm
: clips the ISRM receptor geodata to only the relevant ones based on the output_region
(i.e., returns the receptor_IDs
and receptor_geometry
objects)External Functions
get_pollutant_layer
: returns the ISRM matrix for a single pollutantmap_isrm
: simple function for mapping the ISRM grid cellspopulation.py
The population
object stores detailed Census tract-level population data for the environmental justice exposure calculations and the health impact calculations from an input population dataset.
Inputs
file_path
: the file path of the raw population dataload_file
: a Boolean indicating whether or not the file should be loaded (for debugging)verbose
: a Boolean indicating whether or not detailed logging statements should be printedAttributes
valid_file
: a Boolean indicating whether or not the file provided is validgeometry
: geospatial information associated with the emissions inputpop_all
: complete, detailed population data from the sourcepop_geo
: a geodataframe with population IDs and spatial informationcrs
: the inherent coordinate reference system associated with the emissions inputpop_exp
: a geodataframe containing the population information with associated spatial information, summarized across age binspop_hia
: a geodataframe containing the population information with associated spatial information, broken out by age binInternal Functions
check_path
: checks to see if the file exists at the path specified and returns whether the file is validload_population
: loads the population data based on the file extensionload_shp
: loads the population shapefile data using geopandas and post-processesload_feather
: loads the population feather data using geopandas and post-processesmake_pop_exp
: makes the exposure population data frame by summing across age binsmake_pop_hia
: makes the health impact assessment population data frame by retaining key informationExternal Functions
project_pop
: projects the population data to a new coordinate reference systemallocate_population
: reallocates population into new geometry using a spatial intersectTo streamline calculations and increase functionality of the code, python scripts were created for major calculations/operations. Scripts are saved in the scripts
folder of the repository. The following sections outline the contents of each script file, and how the functions inside them work.
environmental_justice_calcs.py
The environmental_justice_calcs
script file contains a number of functions that help calculate exposure metrics for environmental justice analyses.
create_exposure_df
: creates a dataframe ready for exposure calculations
conc
: concentration object from concentration.py
isrm_pop_alloc
: population object (from population.py
) re-allocated to the ISRM grid cell geometryverbose
: a Boolean indicating whether or not detailed logging statements should be printedexposure_gdf
: a geodataframe with the exposure concentrations and allocated population by racial groupadd_pwm_col
add_pwm_col
: adds an intermediate column that multiplies population by exposure concentration
exposure_gdf
: a geodataframe with the exposure concentrations and allocated population by racial groupgroup
: the racial/ethnic group nameexposure_gdf
: a geodataframe with the exposure concentrations and allocated population by racial group, now with PWM columngroup
+'_PWM'.group
populationget_pwm
: estimates the population-weighted mean exposure for a given group
exposure_gdf
: a geodataframe with the exposure concentrations and allocated population by racial groupgroup
: the racial/ethnic group namePWM_group
: the group-level population weighted mean exposure concentration (float)add_pwm_col
group
_PWM column and dividing by the total group
populationget_overall_disparity
: returns a table of overall disparity metrics by racial/ethnic group
exposure_gdf
: a geodataframe with the exposure concentrations and allocated population by racial grouppwm_df
: a dataframe containing the PWM, absolute disparity, and relative disparity of each groupget_pwm
functionGroup_PWM
- Total_PWM
Absolute Disparity
/Total_PWM
estimate_exposure_percentile
: creates a dataframe of exposure percentiles for plotting
exposure_gdf
: a geodataframe with the exposure concentrations and allocated population by racial groupverbose
: a Boolean indicating whether or not detailed logging statements should be printeddf_pctl
: a dataframe of exposure concentrations by percentile of population exposed by groupexposure_gdf
dataframe to prevent writing over the original.group
.group
.group
.run_exposure_calcs
: calls the other exposure justice functions in order
conc
: concentration object from concentration.py
isrm_pop_alloc
: population object (from population.py
) re-allocated to the ISRM grid cell geometryverbose
: a Boolean indicating whether or not detailed logging statements should be printedexposure_gdf
: a dataframe containing the exposure concentrations and population estimates for each groupexposure_pctl
: a dataframe of exposure concentrations by percentile of population exposed by groupexposure_disparity
: a dataframe containing the PWM, absolute disparity, and relative disparity of each groupcreate_exposure_df
function.get_overall_disparity
function.estimate_exposure_percentile
function.export_exposure_gdf
: exports the exposure concentrations and population estimates as a shapefile
exposure_gdf
: a dataframe containing the exposure concentrations and population estimates for each groupshape_out
: a filepath string of the location of the shapefile output directoryf_out
: the name of the file output category (will append additional information)shape_out
directory.fname
as a surrogate for completion (otherwise irrelevant)export_exposure_csv
: exports the exposure concentrations and population estimates as a CSV file
exposure_gdf
: a dataframe containing the exposure concentrations and population estimates for each groupoutput_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)output_dir
.fname
as a surrogate for completion (otherwise irrelevant)export_exposure_disparity
: exports the exposure concentrations and population estimates as a shapefile
exposure_disparity
: a dataframe containing the population-weighted mean exposure concentrations for each groupoutput_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)output_dir
.fname
as a surrogate for completion (otherwise irrelevant)plot_percentile_exposure
: creates a plot of exposure concentration by percentile of each group's population
output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)exposure_pctl
: a dataframe of exposure concentrations by percentile of population exposed by groupverbose
: a Boolean indicating whether or not detailed logging statements should be printedoutput_dir
.seaborn
library's lineplot
function.f_out
+ '_PM25_Exposure_Percentiles.png' into the out_dir
.export_exposure
: calls each of the exposure output functions in parallel
exposure_gdf
: a dataframe containing the exposure concentrations and population estimates for each groupexposure_disparity
: a dataframe containing the population-weighted mean exposure concentrations for each groupexposure_pctl
: a dataframe of exposure concentrations by percentile of population exposed by groupshape_out
: a filepath string of the location of the shapefile output directoryoutput_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information)verbose
: a Boolean indicating whether or not detailed logging statements should be printed output_dir
.create_rename_dict
: makes a global rename code dictionary for easier updating
logging_code
: a dictionary that maps endpoint names to log statement codeshealth_impact_calcs.py
The health_impact_calcs
script file contains a number of functions that help calculate health impacts from exposure concentrations.
create_hia_inputs
: creates the hia_inputs object.
pop
: population object inputload_file
: a Boolean telling the program to load or notverbose
: a Boolean telling the program to return additional log statements or notgeodata
: the geographic data from the ISRMincidence_fp
: a string containing the filepath where the incidence data is storedkrewski
: defines a Python function around the Krewski et al. (2009) function and endpoints
verbose
: a Boolean indicating whether or not detailed logging statements should be printedconc
: a float with the exposure concentration for a given geographyinc
: a float with the background incidence for a given group in a given geographypop
: a float with the population estimate for a given group in a given geographyendpoint
: a string containing either 'ALL CAUSE', 'ISCHEMIC HEART DISEASE', or 'LUNG CANCER'endpoint
across the group in a given geographyendpoint
, grabs a beta
parameter from Krewski et al. (2009).$$ 1 - ( \frac{1}{\exp(\beta{d} \times C{i})} ) \times I{i,d,g} \times P{i,g} $$
create_logging_code
: makes a global logging code for easier updating
logging_code
: a dictionary that maps endpoint names to log statement codescalculate_excess_mortality
: estimates excess mortality for a given endpoint
and function
conc
: a float with the exposure concentration for a given geographyhealth_data_obj
: a health_data
object as defined in the health_data.py
supporting scriptendpoint
: a string containing either 'ALL CAUSE', 'ISCHEMIC HEART DISEASE', or 'LUNG CANCER'function
: the health impact function of choice (currently only krewski
is built out)verbose
: a Boolean indicating whether or not detailed logging statements should be printed pop_inc_conc
: a dataframe containing excess mortality for the endpoint
using the function
provideddetailed_conc
method of the conc
object and the pop_inc
method of the health_data_obj
.function
.plot_total_mortality
: creates a map image (PNG) of the excess mortality associated with an endpoint
for a given group
.
hia_df
: a dataframe containing excess mortality for the endpoint
using the function
providedca_shp_fp
: a filepath string of the California state boundary shapefilegroup
: the racial/ethnic group nameendpoint
: a string containing either 'ALL CAUSE', 'ISCHEMIC HEART DISEASE', or 'LUNG CANCER'output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information) verbose
: a Boolean indicating whether or not detailed logging statements should be printed fname
: a string filename made by combining the f_out
with the group
and endpoint
.seaborn
and matplotlib.pyplot
.f_out
, group
, and endpoint
.hia_df
to match the coordinate reference system of the California dataset.hia_df
for more intuitive plotting.export_health_impacts
: exports mortality as a shapefile
hia_df
: a dataframe containing excess mortality for the endpoint
using the function
providedgroup
: the racial/ethnic group nameendpoint
: a string containing either 'ALL CAUSE', 'ISCHEMIC HEART DISEASE', or 'LUNG CANCER'output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information) verbose
: a Boolean indicating whether or not detailed logging statements should be printed fname
: a string filename made by combining the f_out
with the group
and endpoint
.fname
) using inputs.export_health_impacts_csv
: exports mortality as a csv
hia_df
: a dataframe containing excess mortality for the endpoint
using the function
providedendpoint
: a string containing either 'ALL CAUSE', 'ISCHEMIC HEART DISEASE', or 'LUNG CANCER'output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information) verbose
: a Boolean indicating whether or not detailed logging statements should be printed fname
: a string filename made by combining the f_out
with the group
and endpoint
.fname
) using inputs.create_summary_hia
: creates a summary table of health impacts by racial/ethnic group
hia_df
: a dataframe containing excess mortality for the endpoint
using the function
providedendpoint
: a string containing either 'ALL CAUSE', 'ISCHEMIC HEART DISEASE', or 'LUNG CANCER'verbose
: a Boolean indicating whether or not detailed logging statements should be printed l
: an intermediate string that has the endpoint label string (e.g., ACM_)endpoint_nice
: an intermediate string that has a nicely formatted version of the endpoint (e.g., All Cause) hia_summary
: a summary dataframe containing population, excess mortality, and excess mortality rate per demographic groupvisualize_and_export_hia
: calls plot_total_mortality
and export_health_impacts
in one clean function call.
hia_df
: a dataframe containing excess mortality for the endpoint
using the function
providedca_shp_fp
: a filepath string of the California state boundary shapefilegroup
: the racial/ethnic group nameendpoint
: a string containing either 'ALL CAUSE', 'ISCHEMIC HEART DISEASE', or 'LUNG CANCER'output_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information) shape_out
: a filepath string for shapefilesverbose
: a Boolean indicating whether or not detailed logging statements should be printed hia_summary
: a summary dataframe containing population, excess mortality, and excess mortality rate per demographic groupplot_total_mortality
.combine_hia_summaries
: combines the three endpoint summary tables into one export file
acm_summary
: a summary dataframe containing population, excess all-cause mortality, and all-cause mortality ratesihd_summary
: a summary dataframe containing population, excess IHD mortality, and IHD mortality rates lcm_summary
: a summary dataframe containing population, excess lung cancer mortality, and lung cancer mortality ratesoutput_dir
: a filepath string of the location of the output directoryf_out
: the name of the file output category (will append additional information) verbose
: a Boolean indicating whether or not detailed logging statements should be printed create_rename_dict
: makes a global rename code dictionary for easier updating
logging_code
: a dictionary that maps endpoint names to log statement codestool_utils.py
The tool_utils
library contains a handful of scripts that are useful for code execution.
check_setup
: checks that the isrm_health_calculations local clone is set up properly
valid_setup
: a Boolean indicating if the setup is valid or notsetup_logging
: sets up the log file capability using the logging
library
debug_mode
: a Boolean indicating if log statements should be returned in debug mode or nottmp_logger
: a filepath string associated with a temporary log file that will be moved as soon as the output directory is createdlogging
library.tmp_logger
) that allows the file to be created before the output directory.verboseprint
: sets up the verbose printing mechanism for global usage
verbose
: a Boolean indicating if it is in verbose mode or nottext
: a string to be returned if the program is in verbose modereport_version
: reports the current working version of the tool
create_output_dir
: creates the output directory for saving files
batch
: the batch name name
: the run nameoutput_dir
: a filepath string for the output directoryf_out
: a string containing the filename pattern to be used in output filesf_out
by removing the 'out' before the output_dir
.create_shape_out
: creates the output directory for saving shapefiles
output_dir
: a filepath string for the output directoryshape_out
: a filepath string for the shapefile output directoryoutput_dir
called 'shapes'.shape_out
.get_output_region
: creates the output region geodataframe
region_of_interest
: the name of the region to be contained in the output_region
region_category
: a string containing the region category for the output region, must be one of 'AB','AD', or 'C' for Air Basins, Air Districts, and Countiesoutput_geometry_fps
: a dictionary containing a mapping between region_category
and the filepathsca_fps
: a filepath string containing the link to the California border shapefileoutput_region
: a geodataframe containing only the region of interestregion_of_interest
is California, in which case, it just reads in the California shapefile.region_of_interest
:
region_category
from the output_geometry_fps
dictionary.region_of_interest
.The tool is configured to be run on a Mac or via Linux terminal (including Windows Subsystem for Linux) on the Google Cloud or Windows Subsystem for Linux. Instructions for each of those are linked in the previous sentence.
In alphabetical order, the following people are acknowledged for their support and contributions: