Open cabreraalex opened 1 year ago
@xnought any thoughts on this? Any downside? One I can think of is you have to store the projection coordinates, using up disk space, but should be minimal?
Depending on the data format yeah disk space would not be too bad.
Sidenote: it could be better to use parquet when caching columns for that extra compression.
I do like your idea. I think I'll give that a shot next.
There is also something else to think about: should users be able to mess with tsne parameters (like perplexity)?
Should the user be able to recompute tsne? Given how different the results are with the tsne parameters, maybe?
Also if there dataset is too large and tsne ends up taking the eternities, what then?
That would favor our current method where they can just load one tsne instead or preloading all of them.
We could add an option to the TOML that are parameters for the TSNE?
For your last point, if it's too large the current method would be worse because if you leave the screen it would stop processing and lose your progress.
If a user provides embeddings, we should compute the projections as a preprocessing step and cache the result. Will make interaction from then on much, much faster. Can create an option to not compute projections as well if we want.