mwang87 / SMART_NMR

Repository to help develop SMART
https://smart.ucsd.edu/
MIT License
0 stars 0 forks source link

Optimizing hyperparameters for t-SNE and/or UMAP #49

Open RaphaelR87 opened 4 years ago

RaphaelR87 commented 4 years ago

https://github.com/berenslab/rna-seq-tsne|berenslab/rna-seq-tsneberenslab/rna-seq-tsne | 24. Okt. 2018 | Hinzugefügt von GitHub Raphael 18:28 Uhr very insightful thanks. As we cannot play around much with the google tensorflow embedding projector we might want to change to UMAP from a quick inspection with some different classes of compounds UMAP seems to cluster SMART data similarly well as t-SNE and we cannot adjust t-SNE to the parameters suggested in the paper such as n/100 for perplexicity (100 is the max value) as well for the learning rate further UMAP runs pretty fast (30s for 30neighbors) on my laptop but the points are reduced to 5000 wout 19:55 Uhr Yes, this is a useful paper. I'd already read the preprint. This confirms the point that I made during the last meeting that you shouldn't just blindly accept some t-SNE embedding, because the hyperparameters definitely make a difference and the defaults aren't always optimal. UMAP is typically considered as an improved version of t-SNE, but you can see in the paper that it suffers from some of the same downfalls as well. And with optimized hyperparameters t-SNE can beat UMAP. If you want super-fast UMAP and t-SNE there's a GPU-powered version you can use: https://rapids.ai/ & https://medium.com/rapids-ai/tsne-with-gpus-hours-to-seconds-9d9c17c941db I speak from experience that it makes a massive difference! (If you have an NVIDIA GPU of course.) (bearbeitet) Raphael 09:21 Uhr thanks for the clarification, Wout. Problem is that the majority (probably >>90% for at least the next year(s)) of our users do not have GPU's. We have 3 NVIDIA GPU's in one machine in Chens office. Do you think we should optimize the hyperparameters for t-SNE and UMAP for the current dataset, thus optimize locol and global distributaions and then use this output for the results of every user. In that way every user gets the same output with a focus on the closest neighbors of their query spectra. That makes a lot of sence for comparibility I think !? wout 10:19 Uhr yes. Local structure is the most important for this application in any case. You don't even really need t-SNE for that, but it's a decent gimmick.

beowulf-zc commented 4 years ago

We should discuss this. We have two additional GPUs that Gary donated us which can be used to optimized the TSNE and UMAP. But we should talk with Ming about how to implement.