jameschapman19 / cca_zoo

Canonical Correlation Analysis Zoo: A collection of Regularized, Deep Learning based, Kernel, and Probabilistic methods in a scikit-learn style framework
https://cca-zoo.readthedocs.io/en/latest/
MIT License
191 stars 41 forks source link

Implement Group (Sparse) Methods #141

Closed JohannesWiesner closed 2 years ago

JohannesWiesner commented 2 years ago

Current implementations of CCA cannot handle data structures where one would expect structured covariance for certain variables (e.g. brain regions can be expected to covary the more close they are spatially, behavioral variables can be aggregated to certain groups like cognition, psychopathology, drugs, etc.). There have been two attempts to solve this problem: Group Sparse Canonical Correlation Analysis and Group Factor analysis. Maybe this could be implemented in cca-zoo?

Group (sparse) canonical correlation:

Group regularized canonical correlation analysis (GRCCA) paper: https://arxiv.org/abs/2011.01650 https://www.sciencedirect.com/science/article/pii/S1053811921004146

Implementation in R can be found here: https://github.com/ElenaTuzhilina/RCCA

Group Factor Analysis:

Group Factor Analysis papers: https://arxiv.org/pdf/1411.5799.pdf https://www.jmlr.org/papers/volume18/16-509/16-509.pdf https://www.sciencedirect.com/science/article/pii/S1053811921011253#sec0013 (extension to vanilla GFA)

This seems to be the original implementation in R (which is confusing because in GitHub the package is called CCAGFA but the CRAN pdf refers to it as GFA (https://cran.r-project.org/web/packages/GFA/GFA.pdf): https://github.com/cran/CCAGFA/blob/master/R/CCAGFA.R

Implementations in Python can be found here: https://github.com/mladv15/gfa-python https://github.com/ferreirafabio80/gfa

jameschapman19 commented 2 years ago

Answer in two parts but broadly the answer is yes.

Group (sparse) CCA: Looks like a really simple modification. As ever feel free to give it a go otherwise I will leave here as an open issue and I might get round to it.

Group Factor Analysis: Second python reference (ferreirafabio80) was formerly in my group so I may well get in touch with him - indeed if you need a hand I'm sure he'd be happy to help. He's got some cool work coming out soon which is a neat variation on GFA and another PhD student is looking at applying his model so would have a vested interest in being able to compare to other stuff. The important thing is whether the GFA would benefit from inheriting any of the existing base classes and I don't know in enough detail off the top of my head (otherwise one might as well just use the existing implementations).

jameschapman19 commented 2 years ago

The models that inherit RCCA have a _setup_evp() and _solve_evp() method which would just be overwritten by an e.g. GroupSCCA model class and you just want to match up the entries into the generalized eigenvalue problem

JohannesWiesner commented 2 years ago

Group Factor Analysis: Second python reference (ferreirafabio80) was formerly in my group so I may well get in touch with him - indeed if you need a hand I'm sure he'd be happy to help. He's got some cool work coming out soon which is a neat variation on GFA and another PhD student is looking at applying his model so would have a vested interest in being able to compare to other stuff. The important thing is whether the GFA would benefit from inheriting any of the existing base classes and I don't know in enough detail off the top of my head (otherwise one might as well just use the existing implementations).

I am also already in contact with Fabio. Maybe I could try to rework his functions to make them cca-zoo compatible :) For the specific implementation: As always I kind of have to treat the models as a black box, as I lack the statistical skills to dig deeper.

jameschapman19 commented 2 years ago

@JohannesWiesner have translated the R code into this package as GRCCA and PRCCA :)