How to handle categorical variables in Parametric UMAP?

lmcinnes / umap

Uniform Manifold Approximation and Projection

BSD 3-Clause "New" or "Revised" License

7.24k stars 787 forks source link

I would get @timsainb to weigh on more detailed aspects. PramatericUMAP doesn't support combining models in the same way, so you will need to simply come up with a reasonable distance metric across the combined data. I would suggest that part of the answer essentially lies in the fact that you can design and use whatever architecture of neural network you want within ParametricUMAP. For example, I know Tim used convolutional networks specifically for the image datasets, and RNNs for some of the other sequence type datasets. That means that whatever style of network would work best for the mixed data, and that will help with the optimization phase.

As to what matric to use for at least handling the distance computation part? Perhaps some variant of Gower distance would work well enough?

lmcinnes / umap

How to handle categorical variables in Parametric UMAP? #873