xKDR / CRRao.jl

MIT License
35 stars 22 forks source link

Design for Linear Discriminant Analysis (LDA) #25

Open sourish-cmi opened 2 years ago

sourish-cmi commented 2 years ago

I am thinking about how we should do the Linear Discriminant Analysis (LDA) in CRRao. I am thinking out loud. Please correct me if I am saying something wrong. The design that I am thinking of is as follows:

container = @fitmodel(formula, data, modelClass,ClassificationType,CovarianceType)

Example: For binary classification:

container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary,ShrinkageCov)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary,PythonCov)

Example: For multi-class classification:

container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi,ShrinkageCov)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi,PythonCov)

The default covariance type would be sample covariance.

ajaynshah commented 2 years ago

What's PythonCov?

What's the state of Julia packages for robust covariance matrices? Should we let the user pass a function as an argument?

sourish-cmi commented 2 years ago
ajaynshah commented 2 years ago

I was thinking that there will be a general technology for covariance matrix estimation: simple, multiple robust methods, maybe something that works for spare matrices in big data, etc. So there should be a default (simple) but the caller should be able to supply a function that computes the covariance matrix. Or alternatively maybe there will be a function compute.cov(X, method), the caller to LDA should be able to supply the method.

sourish-cmi commented 2 years ago

Humm -- I like both ideas.

Idea 1) covariance matrix estimation: simple, multiple robust methods, shrinkage estimation methods etc.

Idea 2) a function compute.cov(X, method), the caller to LDA should be able to supply the method.

I like the second idea with a default robust method of R.

sourish-cmi commented 2 years ago

@ajaynshah @ayushpatnaikgit @codetalker7 @ShouvikGhosh2048

Struggling to decide - should we use MultivariateStat.jl for LDA. AND/OR should we use Aman's Julia code from scratch for LDA

Ayush's point if we rely on too many packages - then some people will never able to use CRRao because some package will be broken

On the other hand - why bother we are going to rely on lazy load in any way...

Requesting your comment -- now I want to move to LDA development for CRRao

sourish-cmi commented 2 years ago

For now, I am thinking about developing the LDA with Aman's code which is faster than R and Python sklearn but slower than MultivariateStat.jl

Once MultivariateStat.jl becomes stable - we can later adapt the LDA of MultivariateStat.jl as a fast option.

sourish-cmi commented 2 years ago

@ajaynshah @ayushpatnaikgit

We will raise an issue with MultivariateStat.jl that predict is not working. If they provide a solution then we would go ahead and take it in CRRao.jl

Otherwise, we will contribute in MultivariateStat.jl.

ajaynshah commented 2 years ago

Yes, great, let's put all our knowledge on LDA to work to make MultivariateStat.jl stronger. And then in CRRao we will just call that LDA. Let's do the usual hard work:

so that it gets rapidly accepted into the main package.

sourish-cmi commented 2 years ago

I have created this issue with MultivariateStats.jl

https://github.com/JuliaStats/MultivariateStats.jl/issues/204

Basically I said the predict for MulticlassLDA is not working.