tidymodels / discrim

Wrappers for discriminant analysis and naive Bayes models for use with the parsnip package
https://discrim.tidymodels.org
Other
28 stars 3 forks source link

Strimmer's shrinkage discriminant analysis #18

Closed bbuchsbaum closed 3 years ago

bbuchsbaum commented 4 years ago

Feature

The R package "sda" [1] implements shrinkage discriminant analysis which is designed for high-dimensional data [2]. A major advantage of this technique is that by default it optimizes the shrinkage parameter on the training data (using James-Stein shrinkage estimator) and therefore performs well without parameter tuning. In neuroimaging analysis it is common to run thousands of machine learning models over successive "searchlight" windows covering the brain and therefore parameter tuning for every model is not feasible. I have also found sda to perform very well for noisy and highly correlated data. It's also very fast because it uses computational tricks to speed up covariance estimation. So I think "sda" would be a nice addition to the "discrim" package.

I created a bare bones implementation of sda for parsnip here:

https://github.com/bbuchsbaum/shrinkagediscrim

[1] https://cran.r-project.org/web/packages/sda/index.html [2] Ahdesmaki, M., Zuber, V., & Strimmer, K. (2013). Shrinkage discriminant analysis and CAT score variable selection.

topepo commented 3 years ago

I'm a fan of this work so I'd be happy to include it. I considered it originally too.

IIRC this would affect linear discriminant analysis model only? If that's the case, then we should include it as an engine to a new discrim_linear() method.

bbuchsbaum commented 3 years ago

Yes, I think sda is linear. I would also consider it "regularized", although it is slightly different than the existing "linear_regularized" implementations.

topepo commented 3 years ago

There are a lot of ways to conduct regularization. Historically, Regularized discriminant analysis corresponds to a method to combine LDA and QDA (pdf) .

The main question is what the arguments to discrim_linear() would be. I think that an option for regularization_method would make sense (along with an argument for the value).

I might want to resurrect the methods in the sparsediscrim package, which has a lot of other methods.

Suggestions are welcome. I'll start scoping this out in a few days.

topepo commented 3 years ago

closed by #28

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.