Open Richienb opened 4 days ago
Switching between extending np.ndarray
and cupy.ndarray
may be difficult:
Hi Richie, I am familiar with cupy, and while I'm not opposed to adding GPU support, I fear it may be more difficult than simply swapping out numpy for cupy. Apart from the problem you've already identified, other sections of the code would probably be much more difficult to port over, since a substantial part of the library is written in cython and compiles against numpy.
For true gpu support, we'd have to port over every step of the t-SNE algorithm.
_tsne.pyx
, since this is actually compiled against numpy. I would be shocked in simply swapping out numpy for cupy would be a solution here.
Porting the approximation schemes to gpu may also be challenging. The FFT version should be easier in this regard, since a lot of computation there is parallel and well-suited to gpu computation. The BH approximation, however, requires an iteratively building the Barnes-Hut trees at each step. I've seen gpu implementations of this before, but it's likely a bit more complicated.It would be great if you're interested in working on this, however, I'm afraid it would probably require a lot of work. I would be shocked if it were as simple as swapping out numpy for cupy.
The first step is finding nearest neighbors
Could use https://github.com/facebookresearch/faiss, which has gpu support.
Yes, FAISS could be used for NN-search and we could depend on that. However, that's just one of the three steps in t-SNE and I wouldn't want to incorporate GPU support piecemeal.
From my understanding, iterative algorithms like binary search aren't all that well suited to gpu computation
Despite this, successful GPU implementations of T-SNE still use binary search:
Since they still achieve orders-of-magnitude performance improvement, it seems that it is only a challenge in theory.
Although it might be more theoretically sound to use some "better" algorithm, we can definitely start with just using binary search.
In my opinion, binary search is fine because GPUs can be used to train neural networks, whereby instead of using binary search to descend on weights, an even more complex (but maybe more efficient) algorithm is used for gradient descent.
I wouldn't want to incorporate GPU support piecemeal.
👍🏻
Yes, I didn't mean binary search can't be implemented on GPUs, only that it's probably not entirely straightforward. I could be entirely wrong thoguh since I know next to nothing about GPU programming :) But I am completely aware that people have done it and that there are several GPU implementations of t-SNE out there. My point was only that the solution likely won't be as simple as replacing numpy with cupy.
I want to mention that there is also https://github.com/berenslab/contrastive-ne that implements various sampling-based t-SNE approximations (like e.g. InfoNC-t-SNE) in PyTorch. These sampling-based losses are much more amenable for GPU implementations. Personally I don't think porting Barnes-Hut or Fourier Approx to GPU is worth the effort.
openTSNE could be made GPU/CPU-agnostic for potential speed boosts by using a cupy feature.