jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
55 stars 29 forks source link

[Feature Request]: Heckman Logistic Regression #1577

Open TarandeepKang opened 2 years ago

TarandeepKang commented 2 years ago

Description

Adding new regression procedures

Purpose

No response

Use-case

Useful in situations with small datasets, in which separation is likely to be a problem

Is your feature request related to a problem?

No response

Describe the solution you would like

The addition of Firth and Heckman logistic regression

Describe alternatives that you have considered

No response

Additional context

Separation tends to be a problem in small samples with several highly predictive predictors, the Firth procedure can help to overcome this:

https://onlinelibrary.wiley.com/doi/10.1002/sim.1047

Heinze and colleagues have applied the Firth correction to Cox and logistic regression: further references and R packages are available here:

https://cemsiis.meduniwien.ac.at/en/kb/science-research/software/statistical-software/firth-correction/

I am not so familiar with Heckman regression, so I can't provide many details or references. But, colleagues have suggested that it would also be a useful addition to your program. They advise that it can be implemented with this package:

https://www.jstatsoft.org/article/view/v027i07

mathijsdeen commented 1 year ago

This would be very helpful. Small sample bias in logistic regression is largely ignored in literature and (thus?) in software.

The Firth correction might be the best known correction and is especially helpful in the case of (quasi-)complete separation. This method can easily be implemented by switching to an alternative optimizer when calling the glm function somewhere under the hood (I think in jaspRegression's .glmComputeModel?). Cool thing: the Firth correction is proportional to the posterior distribution when using the Jeffreys prior in Bayesian stats.

The standard optimizer for the glm function (through glm.fit) is iteratively reweighted least squares. For the Firth correction, one could replace this optimization routine with a method that maximizes the penalized log-likelihood

$L^\ast(\beta) = L(\beta) + \frac{1}{2}\mathrm{log}|\mathcal{I}|$,

where $|\mathcal{I}|$ is the determinant of the expected information matrix. This is implemented in the brglm2 package as an optimization procedure that can be used from within stats::glm. One could, for instance, add Firth correction as a boolean to the options argument, and dependent on that use either "glm.fit" (the default) or brglm2::brglmFit in the method argument when calling glm in the body of (I think) .glmComputeModel.

Something like this (somewhere around line 60 of https://github.com/jasp-stats/jaspRegression/blob/master/R/glmCommonFunctions.R):

  optimizer <- switch(options$Firth,
                      "no" = "glm.fit",
                      "yes" = brglm2::brglmFit)

  # compute full and null models
  if (options$weights == "") {
    fullModel <- stats::glm(ff, family = familyLink, data = dataset, weights = NULL, method = optimizer)
    nullModel <- stats::glm(nf, family = familyLink, data = dataset, weights = NULL, method = optimizer)
  } else {
    fullModel <- stats::glm(ff, family = familyLink, data = dataset, weights = get(options$weights), method = optimizer)
    nullModel <- stats::glm(nf, family = familyLink, data = dataset, weights = get(options$weights), method = optimizer)
  }

One possible problem: since we're messing with the log-likelihood function, model comparison (using LRT and perhaps AIC and BIC) might be inappropriate since these require plain maximum likelihood optimization. Implementing such corrections should perhaps void the model comparison statistics in the output.

Kucharssim commented 1 year ago

Firth logistic regression will be available in the next JASP release under the GLM regression analysis.

I am not closing this issue just yet - @fqixiang what do you think about the Heckman regression?

fqixiang commented 1 year ago

@Kucharssim Heckman-type selection models seem to be used mostly in econometrics (just an impression, since this is not my field of expertise). The implementation seems quite easy (a two-step procedure where selection probabilities are first estimated using a probit model and then added as a covariate to the regression model of interest). This can already be done by in JASP by making use of the GLM module and the linear regression module together. Of course, it will be easier for users if we have this implemented as a single analysis. I don't have a strong opinion about this, though.

fqixiang commented 1 year ago

@mathijsdeen Thanks for taking the time to write such a wonderful response about Firth logistic regression. Unfortunately, I saw it only just now! I already implemented Firth logistic regression using the logistf package for the new release of JASP. I wish I had read it much earlier and made use of the brglm2 package, especially considering that it also supports other GLMs like ordinal and multinomial logistic regression, which I also implemented for the new JASP release (using the VGAM package). brglm2 would have made the code for the different analyses more consistent and simpler. Perhaps it will be a good idea to switch to brglm2 altogether in the future!

mathijsdeen commented 1 year ago

@fqixiang Cool, I'm looking forward to the implementation! And of course, implementation before efficiency :).

tomtomme commented 2 months ago

Summing up: While Firth logistic is available in GLM regression, Heckmann is still needed.