jameschapman19 / cca_zoo

Canonical Correlation Analysis Zoo: A collection of Regularized, Deep Learning based, Kernel, and Probabilistic methods in a scikit-learn style framework
https://cca-zoo.readthedocs.io/en/latest/
MIT License
192 stars 41 forks source link

Running CCA on GPU #138

Open fipelle opened 2 years ago

fipelle commented 2 years ago

Hi, is it possible to run a subsection of these versions of CCA on GPU? If so, would you please write down a short example?

jameschapman19 commented 2 years ago

Yeah the examples are set up to work with pytorch lightning so should be as simple as passing GPUs=1 to trainer as in here:

https://pytorch-lightning.readthedocs.io/en/stable/common/single_gpu.html

fipelle commented 2 years ago

Does it only apply to the "Deep Models"?

jameschapman19 commented 2 years ago

Ah I see - yes only the deep models.

I'd be curious which model is running slow.

In the alternating optimisation methods the bottleneck will be scikitlearn regression solvers. In the CCA/regularisedCCA/PLS models the bottleneck will be the eigenvalue problem solver.

If you're aware of a GPU accelerated version of either of these bottlenecks I'd be interested either as a pointer or a PR.

The other possible direction is the SOTA speed models for CCA that generally use stochastic methods e.g. https://proceedings.neurips.cc/paper/2017/file/c30fb4dc55d801fc7473840b5b161dfa-Paper.pdf

Or

https://proceedings.neurips.cc/paper/2014/file/54229abfcfa5649e7003b83dd4755294-Paper.pdf

jameschapman19 commented 2 years ago

Also worth saying if you use the Deep CCA methods with single layer linear encoders they should converge to CCA and then could use GPU via that route (if you use full batch gradient descent ie minibatch size = dataset size)

fipelle commented 2 years ago

Also worth saying if you use the Deep CCA methods with single layer linear encoders they should converge to CCA and then could use GPU via that route (if you use full batch gradient descent ie minibatch size = dataset size)

Thanks, I will start with that then - trying to build from the examples in the documentation.

I'd be curious which model is running slow.

I am trying to run NCCA with a very large dataset. I was hoping in a GPU implementation for the NN part in the same spirit of the one in cuml - I am not an expert in NCCA and I do not know it is feasible! :)

I thought about pre-processing the data with PCA first and then running NCCA, but it does not look very elegant given that I would have to employ two dimensionality reduction techniques.

jameschapman19 commented 2 years ago

Ah interesting! Must admit my implementation of NCCA is definitely functional rather than optimized. If there's a faster nearest neighbour algo that can be imported I'd defo drop it in.

Just spotted that the sklearn implementation I've been using can take n_jobs which I don't utilise fully at the mo so that's an easy win

jameschapman19 commented 2 years ago

https://towardsdatascience.com/make-knn-300-times-faster-than-scikit-learns-in-20-lines-5e29d74e76bb

Might take a look at this

fipelle commented 2 years ago

Thanks! Take a look at the NN in https://docs.rapids.ai/api/cuml/stable/api.html. I think you should be able to import it right away given that it shares a lot of the scikit-learn syntax.

fipelle commented 2 years ago

I am trying to see if it is sufficient to change ncca.py#5 from from sklearn.neighbors import NearestNeighbors to from cuml.neighbors import NearestNeighbors. It would also be nice to have access to the NN options described in https://docs.rapids.ai/api/cuml/stable/api.html#nearest-neighbors when defining NCCA.

EDIT

It seems to work.

jameschapman19 commented 2 years ago

Just seen the edit! that's cool!

beckernick commented 2 years ago

Hi! I just came across this issue due to the cuML / RAPIDS mention.

I wanted to note that we've implemented input-to-output data type consistency for all cuML estimators (not just NearestNeighbors). This means that while training and prediction will be on the GPU, if you pass CPU data into the estimator it will return CPU data from estimator methods and be compatible with existing code that relies on CPU data.

We've seen folks build wrapper classes or just use a utility function like the following and wrap the instantiation of the estimators into branching statements:

def has_cuml():
    try:
        import cuml
        return True
    except ImportError:
        return False

Happy to help answer questions about cuML (or RAPIDS in general) if you have any.