juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:
https://juliasilge.com/
40 stars 27 forks source link

Dimensionality reduction of #TidyTuesday United Nations voting patterns | Julia Silge #14

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Dimensionality reduction of #TidyTuesday United Nations voting patterns | Julia Silge

Explore country-level UN voting with a tidymodels approach to unsupervised machine learning.

https://juliasilge.com/blog/un-voting/

michaelgaunt404 commented 3 years ago

Quick question: how do you reconcile applying PCA reduction to discrete data like in this case (or for word counts/one hot/etc)? I thought that if your variables don't belong on a coordinate plane, then do not apply PCA to them.

I've been struggling with this concept especially with unsupervised learning with text data and sparse data. Are there any plans to bring in other unsupervised methods into recipes a la clusterR, cluster, factormineR? If not, are there any methods that you like to use yourself when doing EDA/unsupervised leaning?

juliasilge commented 3 years ago

Oh, you definitely have a point about whether this is the best data for something like PCA; take a look at the plot showing which roll call votes contribute to the principal components and notice how the values are all sort of the same. I don't know that you can never take something like indicator variables and do dimensionality reduction, though.

There are quite a number of unsupervised methods available in recipes and the recipes-adjacent packages like embed, including ones specifically for categorical data.

When it comes to clustering specifically, we are gathering thoughts and community feedback in this planning repo PR.

JamesHWade commented 3 years ago

I learn something every time. Thank you for this fantastic content!

Ji-square commented 3 years ago

Thanks pls. Don't stop doing what you are doing ! I love it !!