Open JohnTigue opened 4 years ago
This sounds like it could scale rather well:Impact of cancer mutational signatures on transcription factor motifs in the human genome
We first used the UMAP dimensionality reduction method on the table of exposure values of the 2708 samples, and then defined clusters using the hdbscan method, as implemented in the largeVis R-package.
How to filter outliers to train a model on cleaner data?
How about tune UMAP to accentuate outliers. The HDBSCAN cluster. The take only the clustered nodes and run them through UMAP.
See also #28 (comparing clusters to Allen cell types)