Open RaulRomaniF opened 6 years ago
This can be done in theory; in practice I am still working on the code to do this, so it isn't available in the repository yet. This may not be the answer you are looking for. As in interim step you can check issue #58 which provides a simple recipe to do this in straightforward cases.
On Sun, Aug 5, 2018 at 1:08 AM romanics notifications@github.com wrote:
I want to project the Titanic dataset https://www.kaggle.com/c/titanic/data it contains categorical and numerical data?
I heard in this video https://www.youtube.com/watch?v=YPJQydzTLwQ (min 48:08), basically says that UMAP can combine multiple data types, so the question is how?
One approach would be to project numerical data only and then categorical data only and finally combine them in the same space. But Is that approach the way to go?
Thank you for your time.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lmcinnes/umap/issues/104, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBdRR_uPa94E19J3HWL1wO3R0LbOvks5uNn3EgaJpZM4VvPjc .
The data with both categorical and numerical data types can be handled using gower-distance metric. You can download the code for gower distance metric from here. It might be available in coming scikit-learn release.
While Gower distance is quite useful it is also somewhat heuristic. I would recommend exploring it as one of the options for handling mixed continuous and categorical data.
Is it possible to create indicator variables from categorical variables.
I want to project the Titanic dataset it contains categorical and numerical data?
I heard in this video (min 48:08), basically says that UMAP can combine multiple data types, so the question is how?
One approach would be to project numerical data only and then categorical data only and finally combine them in the same space. But Is that approach the way to go?
Thank you for your time.