lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.38k stars 802 forks source link

Weights when combining multiple UMAP models #601

Open candalfigomoro opened 3 years ago

candalfigomoro commented 3 years ago

In this example https://github.com/lmcinnes/umap/issues/58#issuecomment-419682509 there was a mix_weight parameter to set a specific intersection weight.

When using the new * operator (https://umap-learn.readthedocs.io/en/latest/composing_models.html), is there a way to set a specific weight?

Could something like mapper1 * mapper2 * mapper1 give a higher weight to mapper1 (since we intersect it 2 times)? What is the proper way to do this?

Thank you :)

lmcinnes commented 3 years ago

Currently there is no proper way to do it -- the interface provides a quick and easy approach, but doesn't support a mix weight (the weighting is balanced between the two). It is tricky to have an API that would be both simple to use, and yet have enough flexibility. The right answer might be to add a separate compose method that can take a bunch of parameters such as the compose operator, mix weights, etc. Perhaps in an upcoming patch release (if the implementation turns out to be not too hard); perhaps in 0.6 (if things get messy). Worst case you can fall back to the approach outlined in the cited issue -- it should still work.

RasGre commented 1 year ago

I also have a similar question. I work on a dataset with about 82k observations and 140 features, of which only a few are numerical and the remainder are One-Hot-encoded variables.

I saw, that umap.umap_.general_simplicial_set_intersection included a weight parameter (https://antonsruberts.github.io/kproto-audience/) - is it then preferable to use "the old approach" for conducting intersections rather than the * operator?