prody / ProDy

A Python Package for Protein Dynamics Analysis
http://prody.csb.pitt.edu
Other
417 stars 153 forks source link

Add support for weighted PCA and ICA/tICA? #1584

Open SHZ66 opened 2 years ago

SHZ66 commented 2 years ago

I think it is worth considering adding support for weighted PCA and ICA/tICA in ProDy.

The former should be fairly easy, since it already exists to an extent in the current PCA class already: https://github.com/prody/ProDy/blob/master/prody/dynamics/pca.py#L180

but only when the input data is an Ensemble class with weights. A similar treatment should be added to https://github.com/prody/ProDy/blob/master/prody/dynamics/pca.py#L166

So that one can pass a weight vector (for each sample) or matrix (for each sample and atom) as a parameter to PCA.buildCovariance.

ICA is trickier to implement, but the covariance matrix is the same. Only the decomposition part is different. A good formula to follow is probably from scikit learn: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html

jamesmkrieger commented 2 years ago

sounds good to me

We may also want to consider giving an option for scikit-learn PCA, which seems to be faster

SHZ66 commented 2 years ago

sounds good to me

We may also want to consider giving an option for scikit-learn PCA, which seems to be faster

Great! I can take the WPCA for a spin if you'd like.

I wonder if their speed-up comes from the fact that they are using SVD instead of the regular eigensolver, which is provided as an option in ProDy already: https://github.com/prody/ProDy/blob/master/prody/dynamics/pca.py#L230 (Although I think the API point performSVD should be integrated into calcModes and can be turned on by a switch).

jamesmkrieger commented 2 years ago

sounds good to me We may also want to consider giving an option for scikit-learn PCA, which seems to be faster

Great! I can take the WPCA for a spin if you'd like.

Yes, go ahead!

I wonder if their speed-up comes from the fact that they are using SVD instead of the regular eigensolver, which is provided as an option in ProDy already: https://github.com/prody/ProDy/blob/master/prody/dynamics/pca.py#L230 (Although I think the API point performSVD should be integrated into calcModes and can be turned on by a switch).

I'm not sure. Could be. I haven't yet got round to systematically comparing it.

There's an implementation in https://github.com/scipion-em/scipion-em-continuousflex/blob/rv_pdb_dimred/continuousflex/protocols/protocol_pdb_dimred.py that I'd be comparing with.

They also have UMAP that looks quite similar so may be worth adapting into ProDy too