Open ulupo opened 4 years ago
You can access the graph representation right now as the graph_
attribute of a fitted model, so that is something. The catch is that that doesn't fit into pipelines as I presume you would like. Some recent changes in 0.5dev have split out the portions of fit
into graph construction and embedding construction (not least to aid in some work on an implementation of Parametric UMAP using neural networks to learn an embedding function). Given that it seems like this would be quite feasible. I'll see what I can do.
Thanks @lmcinnes! Interesting to know about the work on Parametric UMAP... is there a reference already?
Yes, indeed, the ability to place the graph-only version into a pipeline is where I was coming from. Great to see it might fit into the development roadmap.
All the work on Parametric UMAP is by Tim Sainburg. He has a paper in the works, so there will hopefully be something soon. In the meantime you can check #489 for the PR in progress.
I was wondering if the following proposal is worth a discussion. I am fully sold on
UMAP
being conceived as a dimensionality reduction algorithm, whose effectiveness is a function in large part of the quality of its embeddings for downstream tasks.However, I also think that the abstract sparse graphs ("fuzzy simplicial sets") UMAP computes as part of its
fit
routine, prior to the embedding step, have values in their own right. One could compute all sorts of invariants from these graphs directly (you probably already suspect which sort of invariants I would like to access ingiotto-tda
).In brief, I'm wondering whether there could be scope for extending the
UMAP
API as follows: a new init parametermode
could be added to theUMAP
constructor. The default value could be'embedding'
, leading to the current behaviour. There could then be another value, say'graph'
or'fuzzy_simplicial_set'
, and aUMAP
instance instantiated with thismode
would return a sparse graph instead of an embedding infit_transform
(and skip the embedding step, of course).