GPU Support - Githubissues

Richienb commented 4 days ago

openTSNE could be made GPU/CPU-agnostic for potential speed boosts by using a cupy feature.

Richienb commented 4 days ago

Switching between extending np.ndarray and cupy.ndarray may be difficult:

https://github.com/pavlin-policar/openTSNE/blob/52ae1d67cbe2b99995e6c8dc0fcc3992344998bc/openTSNE/tsne.py#L404

pavlin-policar commented 4 days ago

Hi Richie, I am familiar with cupy, and while I'm not opposed to adding GPU support, I fear it may be more difficult than simply swapping out numpy for cupy. Apart from the problem you've already identified, other sections of the code would probably be much more difficult to port over, since a substantial part of the library is written in cython and compiles against numpy.

For true gpu support, we'd have to port over every step of the t-SNE algorithm.

The first step is finding nearest neighbors, for which we currently use the c++ implementation of Annoy. Here, we'd need to add a dependency on a gpu-enabled NN-search library, of which many exist. This would probably be the simplest step of all.
Determining perplexities via binary-search (here). From my understanding, iterative algorithms like binary search aren't all that well suited to gpu computation. I'm sure it could be done, but this section of the code would likely need to be ported.
Gradient estimation. This would likely be where the real bulk of the work would be. For true gpu support, we'd want to have both BH and FFT approximations implemented. This would likely mean having to re-implement the bulk of _tsne.pyx, since this is actually compiled against numpy. I would be shocked in simply swapping out numpy for cupy would be a solution here. Porting the approximation schemes to gpu may also be challenging. The FFT version should be easier in this regard, since a lot of computation there is parallel and well-suited to gpu computation. The BH approximation, however, requires an iteratively building the Barnes-Hut trees at each step. I've seen gpu implementations of this before, but it's likely a bit more complicated.

It would be great if you're interested in working on this, however, I'm afraid it would probably require a lot of work. I would be shocked if it were as simple as swapping out numpy for cupy.

Richienb commented 3 days ago

The first step is finding nearest neighbors

Could use https://github.com/facebookresearch/faiss, which has gpu support.

pavlin-policar commented 3 days ago

Yes, FAISS could be used for NN-search and we could depend on that. However, that's just one of the three steps in t-SNE and I wouldn't want to incorporate GPU support piecemeal.

Richienb commented 3 days ago

From my understanding, iterative algorithms like binary search aren't all that well suited to gpu computation

Despite this, successful GPU implementations of T-SNE still use binary search:

Since they still achieve orders-of-magnitude performance improvement, it seems that it is only a challenge in theory.

Although it might be more theoretically sound to use some "better" algorithm, we can definitely start with just using binary search.

In my opinion, binary search is fine because GPUs can be used to train neural networks, whereby instead of using binary search to descend on weights, an even more complex (but maybe more efficient) algorithm is used for gradient descent.

Richienb commented 3 days ago

I wouldn't want to incorporate GPU support piecemeal.

👍🏻

pavlin-policar commented 2 days ago

Yes, I didn't mean binary search can't be implemented on GPUs, only that it's probably not entirely straightforward. I could be entirely wrong thoguh since I know next to nothing about GPU programming :) But I am completely aware that people have done it and that there are several GPU implementations of t-SNE out there. My point was only that the solution likely won't be as simple as replacing numpy with cupy.

dkobak commented 2 days ago

I want to mention that there is also https://github.com/berenslab/contrastive-ne that implements various sampling-based t-SNE approximations (like e.g. InfoNC-t-SNE) in PyTorch. These sampling-based losses are much more amenable for GPU implementations. Personally I don't think porting Barnes-Hut or Fourier Approx to GPU is worth the effort.

pavlin-policar / openTSNE

GPU Support #266