pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.42k stars 157 forks source link

Relation to FIt-SNE? #1

Closed ghost closed 5 years ago

ghost commented 5 years ago

How is this related or how does it compare to FIt-SNE on which you have also done work?

pavlin-policar commented 5 years ago

Hi, sorry for the late reply - I somehow missed this in my inbox. This is a re-implementation of Fit-SNE in python/cython. While working on this I simplified the Fit-SNE code substantially - this also resulted in a speed-up. These changes have already been merged into the C++ implementation (see PR).

The goal of this implementation is to have a pure python implementation, which wouldn't need any specific libraries which are difficult to compile on Windows (this is not true yet, because I need to replace fftw with numpy's FFT which are supposedly faster when using intel's mkl optimizations). I am currently integrating this into Orange, where we currently use scikit-learn, which is terribly slow. Because we'll be using it in Orange, there's more functionality for interactive optimization than in most libraries. We can stop and continue optimization with callbacks, get visualizations before the optimization is complete. This is nice to have in an interactive widget environment. Having the library written in Python is also better for experimenting and possible extensions, and having written the slow parts in Cython, the difference in speed to the C++ implementation is negligible.

There are also a couple other minor differences:

I've been meaning to put citations into the readme, so everything is properly attributed, but I've been swamped and haven't had the time to do it.

TLDR: I found the FitSNE paper and implementation and couldn't really make sense of it, so I worked through the theory and wrote my own implementation, where the code is far more readable IMO. I added various bits and bobs that I found while reading up on tSNE. The slow parts are written in cython so it's multithreaded and about as fast as C++. The other parts are written in python so it's really easy to play around with and experiment with tSNE.