Closed HeuristicLab-Trac-Bot closed 7 years ago
r14413 refactored c++ style code to C# (use of [,] arrays, int vs uint,..) + corrected IterationsCounter
Comments from an initial review:
It should be possible to stop the algorithm at any timeA quality line chart for the error would probably be interestingIt would be nice to be able to view the projection after each iterationThe descriptions for parameters should contain information on default settings or useful settings for the parameters.Is it necessary to tune all the parameters for learning? Or would it also be ok to just use some robust default settings and hide most of the parameters (except for perplexity)- I think it is not strictly necessary that TSNE derives from Item (since it is probably never used directly in the GUI)
Error message: "Perplexity should be lower than K" what's K?Let's discuss this in person...
r14512 worked in several comments from mkommend, extended analysis during algorithm run, added more Distances, made algorithm stoppable
More observations:
- TSNE should be a BasicAlgorithm
Exception when switching between views (projected data & quality line chart) while the algorithm is runningr14512 added references to files for a kernel PCA in the project file (please remove).Why does the error change abruptly when the 'stop-lying-iteration' is reached? (--> OK)Hide parameters: *Momentum, Eta, MomentumSwitch, StopLying. Set StopLying to zero per default.
r14518 TSNEAnalysis is now a BasicAlg, hid Parameters, added optional data normalization to make TSNE scaling-invariant
r14558 made TSNE compatible with the new pausible BasicAlgs, removed rescaling of scatterplots during alg to give it a more movie-esque feel
r14742 fixed displaying of randomly generated seed and some minor code simplifications
The 'performance-improved' distance methods which also accept a threshold seem to be implemented incorrectly. However, they are not used by tSNE anyway so I'm removing them.
sum = 0; ... while(sum > threshold ...) { sum += ... } return sum;
VPTree contains a TODO item //TODO check if minheap or maxheap should be used here
What would be the call if I don't have any features, but only a matrix of distances? Is there some kind of DistanceFunction that is just a lookup in a matrix?
Also as we (bwerth + abeham) discussed: The initial parameters are probably not that well suited for large number of cases. A benchmark set of several different data should be generated and different parameters applied to identify one set that works best on average. Above all, I would strongly recommend theta to be default to 0, because this seems to be only suitable for large datasets. Maybe we can also auto-set this parameter after the problem dimension is known.
Finally, I would recommend to have both TSNE in form of an easy-to-use API call and as a BasicAlgorithm.
Replying to [comment:21 abeham]:
What would be the call if I don't have any features, but only a matrix of distances? Is there some kind of DistanceFunction that is just a lookup in a matrix?
Good point. I propose that we provide a different version of the algorithm for this case (because we don't have a dataset as for all other data-analysis algs).
Also as we (bwerth + abeham) discussed: The initial parameters are probably not that well suited for large number of cases. A benchmark set of several different data should be generated and different parameters applied to identify one set that works best on average. Above all, I would strongly recommend theta to be default to 0, because this seems to be only suitable for large datasets. Maybe we can also auto-set this parameter after the problem dimension is known.
I also already noticed that theta=0 produces very different results. We should leave this option for the case of N>5000 but use theta=0 as default. Additionally, we should look at the differences between theta=0 and theta=eps, maybe there is another issue hidden there.
Finally, I would recommend to have both TSNE in form of an easy-to-use API call and as a BasicAlgorithm.
Full ack. I plan to refactor the code to provide a simple static method.
For comparison purposes, there's also a JavaScript implementation of TSNE (and probably some more).
TODO:
Add parameter to update results only X iterations(done)
r14806: worked on tSNE, storable and cloning for tSNE state. Added some TODO comments while reviewing.
Review comments:
in fast_tsne.m an initial dimensionality reduction using PCA is performed before running bh_tsnefeatures is not absolutely necessarynormalization of data should be moved to TSNEStatic (ZeroMean and diving by max)this is not possible as the tSNE implementation is not specific to real vectors. Therefore, scaling was left in the main algorithm
Issue migrated from trac ticket # 2700
milestone: HeuristicLab 3.3.15 | component: Algorithms.DataAnalysis | priority: medium | resolution: done
2016-11-11 13:14:41: @BernhardWerth created the issue