jzluo / firthlogist

A Python implementation of Firth Logistic Regression
MIT License
5 stars 3 forks source link

firthlogist

PyPI PyPI - Downloads GitHub

A Python implementation of Logistic Regression with Firth's bias reduction.

Installation

pip install firthlogist

Usage

firthlogist is sklearn compatible and follows the sklearn API.

>>> from firthlogist import FirthLogisticRegression, load_sex2
>>> fl = FirthLogisticRegression()
>>> X, y, feature_names = load_sex2()
>>> fl.fit(X, y)
FirthLogisticRegression()
>>> fl.summary(xname=feature_names)
                 coef    std err     [0.025      0.975]      p-value
---------  ----------  ---------  ---------  ----------  -----------
age        -1.10598     0.42366   -1.97379   -0.307427   0.00611139
oc         -0.0688167   0.443793  -0.941436   0.789202   0.826365
vic         2.26887     0.548416   1.27304    3.43543    1.67219e-06
vicl       -2.11141     0.543082  -3.26086   -1.11774    1.23618e-05
vis        -0.788317    0.417368  -1.60809    0.0151846  0.0534899
dia         3.09601     1.67501    0.774568   8.03028    0.00484687
Intercept   0.120254    0.485542  -0.818559   1.07315    0.766584

Log-Likelihood: -132.5394
Newton-Raphson iterations: 8

Parameters

max_iter: int, default=25

 The maximum number of Newton-Raphson iterations.

max_halfstep: int, default=25

 The maximum number of step-halvings in one Newton-Raphson iteration.

max_stepsize: int, default=5

 The maximum step size - for each coefficient, the step size is forced to be less than max_stepsize.

pl_max_iter: int, default=100

 The maximum number of Newton-Raphson iterations for finding profile likelihood confidence intervals.

pl_max_halfstep: int, default=25

 The maximum number of step-halvings in one iteration for finding profile likelihood confidence intervals.

pl_max_stepsize: int, default=5

 The maximum step size while finding PL confidence intervals - for each coefficient, the step size is forced to be less than max_stepsize.

tol: float, default=0.0001

 Convergence tolerance for stopping.

fit_intercept: bool, default=True

 Specifies if intercept should be added.

skip_pvals: bool, default=False

 If True, p-values will not be calculated. Calculating the p-values can be expensive if wald=False since the fitting procedure is repeated for each coefficient.

skip_ci: bool, default=False

 If True, confidence intervals will not be calculated. Calculating the confidence intervals via profile likelihoood is time-consuming.

alpha: float, default=0.05

 Significance level (confidence interval = 1-alpha). 0.05 as default for 95% CI.

wald: bool, default=False

 If True, uses Wald method to calculate p-values and confidence intervals.

test_vars: Union[int, List[int]], default=None

 Index or list of indices of the variables for which to calculate confidence intervals and p-values. If None, calculate for all variables. This option has no effect if wald=True.

Attributes

bse_

 Standard errors of the coefficients.

classes_

 A list of the class labels.

ci_

  The fitted profile likelihood confidence intervals.

coef_

 The coefficients of the features.

intercept_

 Fitted intercept. If fit_intercept = False, the intercept is set to zero.

loglik_

 Fitted penalized log-likelihood.

n_iter_

 Number of Newton-Raphson iterations performed.

pvals_

 p-values calculated by penalized likelihood ratio tests.

References

Firth, D (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38.

Heinze G, Schemper M (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine 21: 2409-2419.