nismod / microsimulation

population and household microsimulation models
MIT License
21 stars 7 forks source link

DOI

NB This package is a work-in-progress and subject to change, the documentation may not reflect the current code

microsimulation

Static and dynamic, population and household, microsimulation models. Take a base population and project it forward using various methodologies.

Current status:

Explanation of terms

Introduction

Static Microsimulation - Population

This refers to a sequence of microsyntheses, seeded with 2011 census data, with marginals from ONS mid-year-estimates (2001-2013) and ONS sub-national population projections (2014-2039).

Static Microsimulation - Households

This refers to a sequence of microsyntheses, seeded with microsynthesised data, with overall counts coming from DCLG household forcasts (1991-2039).

Dynamic Microsimulation

This refers to a stochastic simulation of individual elements (persons or households) in time using a Monte-Carlo approach. Based on provided fertility, mortality and migration data and guided by static microsimulation above.

Setup

Clone the repo:

$ git clone https://github.com/nismod/microsimulation

Dependencies

Requires python 3. The following packages are dependencies, and will need to be installed if not already:

Pip Install

$ python3 -m pip install humanleague ukpopulation ukcensusapi

Conda Install

(humanleague is not currently available via conda-forge, so should be installed with pip for now)

$ conda config --add channels conda-forge # if you haven't already
$ conda install ukcensusapi ukpopulation

Installation and Testing

The ukcensusapi package requires an API key to function correctly, see here for details

From the root directory of the cloned, repo:

./setup.py install
./setup.py test

Running a static population microsimulation

$ scripts/run_ssm.py --help
usage: run_ssm.py [-h] [-c config-file] LAD [LAD ...]

static sequential (population/household) microsimulation

positional arguments:
  LAD                   ONS code for LAD (multiple LADs can be set).

optional arguments:
  -h, --help            show this help message and exit
  -c config-file, --config config-file
                        the model configuration file (json). See
                        config/*_example.json

where config-file is a JSON file containing the model parameters and settings. Examples can be found in the config subdirectory of this package.

{
  "resolution": "MSOA11",
  "projection": "ppp",
  "census_ref_year": 2011,
  "horizon_year": 2039,
  "mode": "fast",
  "cache_dir": "./cache",
  "output_dir": "./data"
}

Running a household microsimulation

The requires, as input, a microsynthesised population of households for one or more LADs at OA level for a census year. This data can be generated from census (aggregate) data using the household_microsynth package.

$ scripts/run_ssm_h.py --help
usage: run_ssm_h.py [-h] [-c config-file] LAD [LAD ...]

static sequential (population/household) microsimulation

positional arguments:
  LAD                   ONS code for LAD (multiple LADs can be set).

optional arguments:
  -h, --help            show this help message and exit
  -c config-file, --config config-file
                        the model configuration file (json). See
                        config/*_example.json`

where config-file is a JSON file containing the model parameters and settings. Examples can be found in the config subdirectory of this package.

{
  "resolution": "OA11",
  "projection": "ppp",
  "census_ref_year": 2011,
  "projection_ref_year": 2014,
  "horizon_year": 2020,
  "upstream_dir": "../household_microsynth/data",
  "input_dir": "./persistent_data",
  "output_dir": "./data"
}

Running the assignment algorithm

This algorithm takes LAD-level populations and households at a specific time and assigns people to the households.

$ scripts/run_assignment.py --help
usage: run_assignment.py [-h] [-c config-file] LAD [LAD ...]

static sequential (population/household) microsimulation

positional arguments:
  LAD                   ONS code for LAD (multiple LADs can be set).

optional arguments:
  -h, --help            show this help message and exit
  -c config-file, --config config-file
                        the model configuration file (json). See
                        config/*_example.json

with a configuration like:

$ cat config/ass_example.json
{
  "person_resolution": "MSOA11",
  "household_resolution": "OA11",
  "projection": "ppp",
  "strict": true,
  "year": 2011,
  "data_dir": "./data"
}

Requirements

It requires data from the household microsimulations and the population microsimulations as described above.

Methodology

The methodology used to is to randomly sample of the synthetic populations from distributions defined by census microdata. Broadly speaking this relates the age, sex, and ethnicity of the HRP to the age, sex, and ethnicity of other household members. It helps to avoid nonsensical or unlikely household combinations such as cohabiting couples with enormous age differences, or children who are only fractionally younger than a parent. The effect is preserve the distribution of household structures seen in the last census. More up-to-date information may be available for surveys (e.g. BHPS) but may lack the breadth of the census microdata.

Of the household structures defined in the census, all contain one household reference person, and some categories are more precise about the number and status of the occupants. For example, single-occupant households must contain a single adult; single-parent households of size 3 must contain one adult and two children. Conversely, multiple occupant households containing 4+ occupants are less well defined.

The approach taken in the algorithm is to get the specific structures assigned first. There is additional leeway provided by the facts that:

The notion of assignment in this context means linking rows in two tables: the household table is given an additional column that refers to an entry in the person table, this is the HRP. The people table is given a column containing a household ID. Once assignment is complete, every person will be associated with a household, and every household will be associated with a HRP. Once a household is filled, it is marked as such and no more people can be assigned to it,

The algorithm loops over the MSOAs in the regions, assigning people to households in the following order:

At this point many households will be fully assigned, but there will generally be unassigned adults and children in the population. They are assigned to those households that are not already full.

This process is repeated for each MSOA in the region.

Batch Processing

HPC facilities are necessary to run a country-wide simulation in any reasonable timeframe (for assignment at least). The examples below have been run on the ARC3 environment, part of the High Performance Computing facilities at the University of Leeds, UK.

The scripts should be relatively easy to modify to run on other clusters supporting SGE.

Running a population microsimulation

Run countrywide, using the default configuration

$ qsub ./pbatch.sh config/ssm_default.json

The SSM algorithm runs sufficiently quickly that each individual process computes 10 LADs consecutively.

Running a household microsimulation

Run countrywide, using the default configuration

$ qsub ./hbatch.sh config/ssm_h_default.json

The SSM algorithm runs sufficiently quickly that each individual process computes 10 LADs consecutively.

Running the assignment algorithm

Run countrywide, using the default configuration

$ qsub ./abatch.sh config/ass_default.json

Run a single LAD (Newcastle):

$ qsub ./asingle.sh config/ass_default.json E08000021

The SSM algorithm runs sufficiently slowly that each LAD requires a dedicated process.