privong / pymccorrelation

Correlation coefficients with uncertainties
GNU General Public License v3.0
10 stars 3 forks source link
bootstrapping-statistics correlation-coefficient monte-carlo

pymccorrelation

A tool to calculate correlation coefficients for data, using bootstrapping and/or perturbation to estimate the uncertainties on the correlation coefficient. This was initially a python implementation of the Curran (2014) method for calculating uncertainties on Spearman's Rank Correlation Coefficient, but has since been expanded. Curran's original C implementation is MCSpearman (ASCL entry).

Currently the following correlation coefficients can be calculated (with bootstrapping and/or perturbation):

Kendall's tau can also calculated when some of the data are left/right censored, following the method described by Isobe+1986.

Requirements

Installation

pymccorrelation is available via PyPi and can be installed with:

pip install pymccorrelation

Usage

pymccorrelation exports a single function to the user (also called pymccorrelation).

from pymccorrelation import pymccorrelation

[... load your data ...]

The correlation coefficient can be one of pearsonr, spearmanr, or kendallt.

For example, to compute the Pearson's r for a sample, using 1000 bootstrapping iterations to estimate the uncertainties:

res = pymccorrelation(data['x'], data['y'],
                      coeff='pearsonr',
                      Nboot=1000)

The output, res is a tuple of length 2, and the two elements are:

The percentile ranges can be adjusted using the percentiles keyword argument.

Additionally, if the full posterior distribution is desired, that can be obtained by setting the return_dist keyword argument to True. In that case, res becomes a tuple of length four:

Please see the docstring for the full set of arguments and information including measurement uncertainties (necessary for point perturbation) and for marking censored data.

Citing

If you use this script as part of your research, I encourage you to cite the following papers:

Please also cite scipy and numpy.

If your work uses Kendall's tau with censored data please also cite: