t-Distributed Stochastic Neighbor Embedding

HeuristicLab-Trac-Bot commented 7 years ago

Issue migrated from trac ticket # 2700

milestone: HeuristicLab 3.3.15 | component: Algorithms.DataAnalysis | priority: medium | resolution: done

2016-11-11 13:14:41: @BernhardWerth created the issue

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets and should be avialable in HeuristicLab

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-11 13:14:55: @BernhardWerth changed status from new to assigned

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-11 13:14:55: @BernhardWerth set owner to @BernhardWerth

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-11 13:16:00: @BernhardWerth changed type from defect to feature request

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-11 13:16:59: @BernhardWerth changed status from assigned to accepted

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-11 14:33:53: @BernhardWerth commented

r14387 created initial branch

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-25 11:00:32: @BernhardWerth commented

r14413 refactored c++ style code to C# (use of [,] arrays, int vs uint,..) + corrected IterationsCounter

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-25 11:00:32: @BernhardWerth

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-25 11:01:01: @BernhardWerth commented

r14414 forgot to add files

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-25 11:01:22: @BernhardWerth changed status from accepted to reviewing

HeuristicLab-Trac-Bot commented 7 years ago

2016-11-25 11:01:22: @BernhardWerth changed owner from @BernhardWerth to @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-18 13:56:23: @gkronber commented

Comments from an initial review:

~~It should be possible to stop the algorithm at any time~~

~~A quality line chart for the error would probably be interesting~~

~~It would be nice to be able to view the projection after each iteration~~

~~The descriptions for parameters should contain information on default settings or useful settings for the parameters.~~

~~Is it necessary to tune all the parameters for learning? Or would it also be ok to just use some robust default settings and hide most of the parameters (except for perplexity)~~

I think it is not strictly necessary that TSNE derives from Item (since it is probably never used directly in the GUI)

~~Error message: "Perplexity should be lower than K" what's K?~~

Let's discuss this in person...

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-18 13:56:23: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-18 13:56:23: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-18 13:56:23: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-18 13:56:23: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-18 17:19:02: @gkronber changed status from reviewing to assigned

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-18 17:19:02: @gkronber changed owner from @gkronber to @BernhardWerth

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-19 07:51:09: @gkronber commented

r14503: minor change while reviewing

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-20 15:50:25: @BernhardWerth commented

r14512 worked in several comments from mkommend, extended analysis during algorithm run, added more Distances, made algorithm stoppable

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-21 12:06:56: @gkronber commented

More observations:

TSNE should be a BasicAlgorithm

~~Exception when switching between views (projected data & quality line chart) while the algorithm is running~~

~~r14512 added references to files for a kernel PCA in the project file (please remove).~~

~~Why does the error change abruptly when the 'stop-lying-iteration' is reached? (--> OK)~~

~~Hide parameters: *Momentum, Eta, MomentumSwitch, StopLying. Set StopLying to zero per default.~~

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-21 12:06:56: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-21 12:06:56: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-21 12:06:56: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-21 12:06:56: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-21 12:06:56: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2016-12-22 10:09:39: @BernhardWerth commented

r14518 TSNEAnalysis is now a BasicAlg, hid Parameters, added optional data normalization to make TSNE scaling-invariant

HeuristicLab-Trac-Bot commented 7 years ago

2017-01-11 07:38:52: @BernhardWerth commented

r14558 made TSNE compatible with the new pausible BasicAlgs, removed rescaling of scatterplots during alg to give it a more movie-esque feel

HeuristicLab-Trac-Bot commented 7 years ago

2017-02-17 13:02:59: @abeham commented

r14682: fixed references for alglib and libsvm

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-10 08:38:11: @BernhardWerth changed status from assigned to reviewing

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-10 08:38:11: @BernhardWerth changed owner from @BernhardWerth to @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-10 08:38:11: @BernhardWerth commented

r14742 fixed displaying of randomly generated seed and some minor code simplifications

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-19 18:19:06: @gkronber commented

The 'performance-improved' distance methods which also accept a threshold seem to be implemented incorrectly. However, they are not used by tSNE anyway so I'm removing them.
 sum = 0;
 ...
 while(sum > threshold ...) {
   sum += ...
 }
 return sum;

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-19 18:37:19: @gkronber commented

VPTree contains a TODO item //TODO check if minheap or maxheap should be used here

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-19 18:47:08: @gkronber commented

r14767: made some changes while reviewing the code

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 11:16:47: @gkronber commented

r14782: renamed files

r14783: renamed Euclidean distance

ABE: fixed revision numbers

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 11:16:47: @abeham

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 12:13:00: @abeham commented

What would be the call if I don't have any features, but only a matrix of distances? Is there some kind of DistanceFunction that is just a lookup in a matrix?

Also as we (bwerth + abeham) discussed: The initial parameters are probably not that well suited for large number of cases. A benchmark set of several different data should be generated and different parameters applied to identify one set that works best on average. Above all, I would strongly recommend theta to be default to 0, because this seems to be only suitable for large datasets. Maybe we can also auto-set this parameter after the problem dimension is known.

Finally, I would recommend to have both TSNE in form of an easy-to-use API call and as a BasicAlgorithm.

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 12:22:37: @gkronber commented

Replying to [comment:21 abeham]:

What would be the call if I don't have any features, but only a matrix of distances? Is there some kind of DistanceFunction that is just a lookup in a matrix?

Good point. I propose that we provide a different version of the algorithm for this case (because we don't have a dataset as for all other data-analysis algs).

Also as we (bwerth + abeham) discussed: The initial parameters are probably not that well suited for large number of cases. A benchmark set of several different data should be generated and different parameters applied to identify one set that works best on average. Above all, I would strongly recommend theta to be default to 0, because this seems to be only suitable for large datasets. Maybe we can also auto-set this parameter after the problem dimension is known.

I also already noticed that theta=0 produces very different results. We should leave this option for the case of N>5000 but use theta=0 as default. Additionally, we should look at the differences between theta=0 and theta=eps, maybe there is another issue hidden there.

Finally, I would recommend to have both TSNE in form of an easy-to-use API call and as a BasicAlgorithm.

Full ack. I plan to refactor the code to provide a simple static method.

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 12:35:15: @gkronber commented

r14784: fixed build fail

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 12:50:42: @abeham commented

For comparison purposes, there's also a JavaScript implementation of TSNE (and probably some more).

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 13:36:13: @gkronber commented

TODO:

~~Add parameter to update results only X iterations~~ (done)

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 13:36:13: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 15:15:33: @gkronber commented

r14785: changes and while reviewing

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 15:29:13: @gkronber commented

r14787: more changes while reviewing

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-27 17:27:14: @gkronber commented

r14788: refactoring

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-30 18:09:03: @gkronber commented

r14806: worked on tSNE, storable and cloning for tSNE state. Added some TODO comments while reviewing.

HeuristicLab-Trac-Bot commented 7 years ago

2017-03-30 19:07:02: @gkronber commented

r14807: support clone and persistence and pause/resume

HeuristicLab-Trac-Bot commented 7 years ago

2017-04-10 15:47:41: @gkronber commented

Review comments:

~~in fast_tsne.m an initial dimensionality reduction using PCA is performed before running bh_tsne~~ features is not absolutely necessary

~~normalization of data should be moved to TSNEStatic (ZeroMean and diving by max)~~ this is not possible as the tSNE implementation is not specific to real vectors. Therefore, scaling was left in the main algorithm

HeuristicLab-Trac-Bot commented 7 years ago

2017-04-10 15:47:41: @gkronber

HeuristicLab-Trac-Bot commented 7 years ago

heal-research / HeuristicLab