jobovy / extreme-deconvolution

Density estimation using Gaussian mixtures in the presence of noisy, heterogeneous and incomplete data
Other
78 stars 23 forks source link
c density-estimation gaussian-mixture-models machine-learning python uncertainty

Extreme Deconvolution (XD)

Density estimation using Gaussian mixtures in the presence of noisy, heterogeneous and incomplete data

.. image:: https://github.com/jobovy/extreme-deconvolution/actions/workflows/build.yml/badge.svg :target: https://github.com/jobovy/extreme-deconvolution/actions/workflows/build.yml

Summary

Extreme-deconvolution (XD) is a general algorithm to infer a d-dimensional distribution function from a set of heterogeneous, noisy observations or samples. It is fast, flexible, and treats the data's individual uncertainties properly, to get the best description possible of the underlying distribution. It performs well over the full range of density estimation, from small data sets with only tens of samples per dimension, to large data sets with millions of data points.

The extreme-deconvolution algorithm is available here as a dynamic C-library that your programs can link to, or through Python, R, or IDL wrappers that allow you to call the fast underlying C-code in your high-level applications with minimal overhead.

News

Requirements

GSL <http://www.gnu.org/software/gsl/>: The GNU Scientific Library. See for example this page <https://galpy.readthedocs.io/en/latest/installation.html#how-do-i-install-the-gsl> for information on how to install the GSL on your system.

Get the latest version

Get the latest version by checking out the git repository:

git clone https://github.com/jobovy/extreme-deconvolution.git

or download it using the big green button above the file listing above.

(note that downloading and installing the latest released version under the 'releases' tab is not recommended, as this version is out-of-date; please install the latest `main`` version instead)

Installation

To compile the code, navigate to the directory where you downloaded the code and do::

make

To install the library do::

sudo make install

or::

make install INSTALL_DIR=/path/to/install/dir/

To install the IDL wrapper do::

make idlwrapper

Add INSTALL_DIR=/path/to/install/dir/ if you used this to install the library

To install the Python wrapper do::

make pywrapper

Add INSTALL_DIR=/path/to/install/dir/ if you used this to install the library. Remember to add the py/ directory to your PYTHONPATH to use the code in Python.

To install the R package do::

make rpackage R CMD INSTALL ExtremeDeconvolution_1.3.tar.gz

Fix the version number as needed. Note that options for compiling packages in R are specified through the Makevars file, which should typically be located at ~/.R/Makevars. For example, if you need to override the default C compiler to gcc-4.9, you would add line CC=gcc-4.9 to the Makevars file before building the package. (You also need to make sure that the proper CC is set in the main Makefile as well.) For more details on customzing R package installation, see here <http://cran.r-project.org/doc/manuals/r-release/R-admin.html#Customizing-package-compilation>__. Alternatively, you may find that it is more convenient to use the install.packages() function in R to install the package. In that case, replace the second step (R CMD INSTALL ...) with the following call within your R environment ::

install.packages(pkgs = "ExtremeDeconvolution_1.3.tar.gz",repos = NULL)

This assumes that the R working directory is the same as the root of this git repository.

To test whether the code and the python wrapper is working do::

make testpy

To test whether the code and the IDL wrapper is working do (requires IDL and the IDL-wrapper to be installed)::

make testidl

Clean up intermediate files::

make clean

Usage

Examples of use of the code are in the IDL example code in <examples/fit_tf.pro>__ and in the python doctest in <py/extreme_deconvolution.py>__.

In python you would typically do something like::

from extreme_deconvolution import extreme_deconvolution

Set up your arrays: ydata has the data, ycovar the uncertainty covariances

initamp, initmean, and initcovar are initial guesses

get help on their shapes and other options using

?extreme_deconvolution

Run the code

extreme_deconvolution(ydata,ycovar,initamp,initmean,initcovar)

initamp, initmean, and initcovar are now updated to their best fit values

In IDL this becomes::

;;Set up arrays and the number of Gaussians K=2 ;;K Gaussians ;;Run the code projected_gauss_mixtures_c, K, ydata, ycovar, initamp, initmean, initcovar, /quiet ;;initamp, initmean, and initcovar are now updated to their best fit values

In R::

library(ExtremeDeconvolution) ?extreme_deconvolution

Installation FAQ

Acknowledgments

Thanks to Gao Wang and Peter Carbonetto for the R interface and Daniela Carollo, Joe Hennawi, Sergey Koposov, and Leonidas Moustakas for bug reports and fixes.

Acknowledging extreme-deconvolution

The algorithm that the code implements is described in the paper Extreme deconvolution: inferring complete distribution functions from noisy, heterogeneous and incomplete observations; a copy of the latest draft of this paper is included in the "doc/" directory of the repository or source archive. If you use the code, please cite this paper, e.g.::

Extreme deconvolution: inferring complete distribution functions from noisy, heterogeneous and incomplete observations
Jo Bovy, David W. Hogg, & Sam T. Roweis, Ann. Appl. Stat. 5, 2B, 1657 (2011)

Extreme-deconvolution in action

A good place to find examples is the citations to the extreme-deconvolution paper <http://adsabs.harvard.edu/abs/2011AnApS...5.1657B>__. The code is also used in a variety of fields outside of astronomy.