OpenGHG Inversions is a Python package that is being develoepd as part of the OpenGHG project with the aim of merging the data-processing and simulation modelling capabilities of OpenGHG with the atmospheric Bayesian inverse models developed by the Atmospheric Chemistry Research Group (ACRG) at the University of Bristol, UK.
Currently, OpenGHG Inversions includes the following regional inversion models:
As OpenGHG Inversions is dependent on OpenGHG, please ensure that when running locally you are using Python 3.10 or later on Linux or MacOS. Please see the OpenGHG project for further installation instructions of OpenGHG and setting up an object store.
Check that you have Python 3.10 or greater:
python --version
(Note for Bristol ACRG group: If you are on Blue Pebble, the default anaconda module lang/python/anaconda
is Python 3.9. Use module avail
to list other options; lang/python/miniconda/3.10.10.cuda-12
or lang/python/miniconda/3.12.2.inc-perl-5.30.0
will work.)
Make a virtual environment
python -m venv openghg_inv
Next activate the environment
source openghg_inv/bin/activate
pip
First you'll need to clone the repository
git clone https://github.com/openghg/openghg_inversions.git
Next make sure pip
and related install tools are up to date and then install OpenGHG Inversions using the editable install flag (-e
)
pip install --upgrade pip setuptools wheel
pip install -e openghg_inversions
Optionally, install the developer requirements (there is more information about this in the "Contributing" section below):
pip install -r requirements-dev.txt
At this point, run
python -c "import pymc"
This should run without printing any messages.
If you receive a message about pymc
or pytensor
using the numpy
C-API, then your inversions might run slowly because the fast linear algebra libraries used by numpy
haven't been found.
Solutions to this are:
python -m pip install numpy
after upgrading pip, setuptools, wheel
conda
env, install numpy
using conda
, then use pip
to upgrade pip, setuptools, wheel
and install openghg_inversions
For an overview of OpenGHG inversions, see this primer.
Keyword arguments are propagated as follows:
ini
file or passed via the --kwargs
flag is passed to the MCMC function as a keyword argument. (Currently, fixedbasisMCMC
is the only available MCMC function)fixedbasisMCMC
) is passed to the function inferpymc
in hbmcmc.inversion_pymc
, which is the function that creates and samples from the RHIME model. Thus you can pass arguments to either fixedbasisMCMC
or inferpymc
, but all of these arguments will be specified in the ini
file (or command line).
Let's look at these two steps in detail.
ini
fileExtra options can be added to an ini
file in almost any location.
The template ini file puts
these option under the heading MCMC.OPTIONS
:
[MCMC.OPTIONS]
averaging_error = True
min_error = 20.0
fixed_basis_outer_regions = False
use_bc = True
nuts_sampler = "numpyro"
save_trace = True
compute_min_error = True
no_model_error = False
These will be passed to the MCMC function (e.g. fixedbasisMCMC
) as keyword arguments.
Any argument in fixedbasisMCMC
can be specified in an ini
file this way.
When running inversions using the script run_hbmcmc.py
, you must specify the start and end date of
the inversion period, and you pass an ini
file using the flag -c
.
In addition, you can pass the output path using the flag --output-path
; this is useful if your SLURM script
uses different output locations for different array jobs.
You can also pass arbitrary keyword arguments to run_hbmcmc.py
using the --kwargs
flag.
For instance:
python run_hbmcmc.py "2019-01-01" "2019-02-01" -c "example.ini" --kwargs '{"averaging_error": true, "min_error": 20.0, "nuts_sampler": "numpyro"}'
It is crucial that you enclose the dictionary in single quotes, otherwise the command line will split the dictionary on white space.
Again, this can be used to change the arguments passed to an inversion on the fly (say, in a SLURM script).
The format of the dictionary inside single quotes must be JSON, because the value of kwargs
is parsed using json.loads
.
Python translates JSON according to this table.
In particular, "true"
in JSON translate to True
in Python (but "True"
will be translated as a string).
The parsing in our ini
files is more flexible; in particular, values that are Python statements will be translated to Python, so you don't need to worry about translation.
The following sections detail some parameters that enable/specify optional behaviour in the inversion.
fixedbasisMCMC
This is not a comprehensive list (see the docstring for fixedbasisMCMC
in the hbmcmc module for more arguments).
Arguments affecting the data using in the inversion:
sites
: a list of the sites to use in the inversion. Other information applied on a site-to-site basis that is presented in lists must be in the same order as used in the sites
list.inlet
: a list of inlets for each site. If only one inlet is available for a given site and species, then None
may be used as the value for that site. If there are a range of inlet heights at a single site, and these should correspond to a single footprint release height, then you may use, for instance, slice(140, 160)
to combine inlet heights between 140 and 160 meters into a single timeseries of observations.
-instrument
, fp_height
, obs_data_level
, and met_model
must either be lists of the same length as sites
, or a single value may be supplied and will be converted to a list of the correct length. Arguments affecting the inverse model:
averaging_error
: if True
, the error from resampling to the given averaging_period
will be added to the observation's error.use_bc
: defaults to True
. If False
, no boundary conditions will be used in the inversion. This implicitly assumes that contributions from the boundary have been subtracted from the observations.fix_basis_outer_regions
:
False
True
, the "outer regions" of the (EUROPE
) domain use basis regions specified by a file provided by the Met Office (from their "InTem" model), and the "inner region", which includes the UK, is fit using our basis algorithms.EUROPE
domain currently.calculate_min_error
: calculate min_error (see below) on the fly using the "residual error method" or a method based on percentiles of observations. Available arguments:
residual
: use "residual error method"percentile
: use method based on percentilesNone
: in this case, you should pass in a value directly using (for instance) min_error = 12.3
min_error_options
: additional parameters to pass to the function that compute min error. This should be a dictionary, and the available options depend on the function used. (The functions to compute min. model error are in model_error.py
).
calculate_min_error = "residual"
, then, for instance, you could use min_error_options = {"robust": False, "by_site": True}
. (By default, robust
is False
, this is just to show the possibilities.) filters
: filters to apply to data (after it is resampled and aligned)
filters = None
will skip filteringfilters
is a list of filters (or a string containing a single filter name), those filters will be applied to all sites.filters
is a dictionary with site codes as keys and lists of filters as values, then each site will have filters applied individually according to this dictionary. All sites must supplied; to skip a site, pass None
instead of a list (or omit that site from the dictionary). For instance: filters = {"MHD": ["pblh_inlet_diff", "pblh_min"], "JFJ": None}
.filtering
function in the utils module.inferpymc
.xprior
and bcprior
: these should be a dictionary containing "pdf": <distribution>
and the arguments that should be passed to the PyMC distribution with that name. <distribution>
Arguments affecting the output of the inversion:
save_trace
:
False
. True
, the arviz InferenceData
output from sampling will be saved to the output path of the inversion, with a file name of the form f"{outputname}{start_data}_trace.nc
. To load this trace into arviz, you need to use InferenceData.from_netcdf
.inferpymc
As mentioned above, any keyword argument passed to fixedbasisMCMC
(either by an ini
file or from --kwargs
on the command line) that is not recognised by fixedbasisMCMC
is passed on to inferpymc
.
These parameters include:
min_error
: a non-negative float value specifying a lower bound for the model-measurement mismatch error (i.e. the error on (y - y_mod)).nuts_sampler
: a string, which defaults to "pymc"
. The other option is "numpyro"
, which will the JAX accelerated sampler from Numpyro; this tends to be significantly faster than the NUTS sampler built into PyMC.pollution_events_from_obs
: Determines whether the model error is calculated as a fraction of:
True
)False
)no_model_error
: if True
, only use obs error in likelihood (omitting min. model error and model error from scaling pollution events).reparameterise_log_normal
: if True
, then log normal priors will be sampled by transforming samples from standard normal random variable to samples from the appropriate log normal distribution.The results of an inversions are returned as an xarray Dataset
.
The dimension nmeasure
consists of the time for each observation stacked into a single 1D array.
TODO: complete this part
Yerror
: obs. error used in the inversion; if add_averaging
is True, this will contain the combined "repeatability" and "variability"; otherwise, it will just contain "repeatability", if it is available, or "variability"Yerror_repeatablity
: obs. repeatability. If repeatability isn't available for some sites, then this is filled with zeros.Yerror_variability
: obs. variability.To contribute to openghg_inversions
, you should also install the developer packages:
pip install -r requirements-dev.txt
This will install the packages flake8, pytest, black
.
We use black
to format our code. To check if your code needs reformatting, run:
black --check openghg_inversions
in your openghg_inversions
repository (with your virtual env activated).
If you replace the flag --check
with --diff
, you can see what will be changed.
To make these changes, run
black openghg_inversions
We also recommend using flake8
to check for code style issues, which you can run with:
flake8 openghg_inversions
You can run the tests using:
pytest
in the openghg_inversions
repository. (Make sure your virtual env is activated.)
To contribute new code, make a branch off of the devel
branch.
When your code is ready to be added, push it to github (origin
).
You can then open a "pull request" on github and request a code review.
It's helpful to write a description of the changes made in your PR, as well as linking to any relevant issues.
Your code must past the tests and be reviewed before it can be merged. After this, you can merge your branch and close it (it can always be recovered later if necessary).
Ganesan et al. (2014),ACP;
Western et al. (2021), Enviro. Sci. Tech Lett.