juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:

https://juliasilge.com/

41 stars 27 forks source link

Dimensionality reduction for #TidyTuesday Billboard Top 100 songs | Julia Silge #48

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Dimensionality reduction for #TidyTuesday Billboard Top 100 songs | Julia Silge

Songs on the Billboard Top 100 have many audio features. We can use data preprocessing recipes to implement dimensionality reduction and understand how these features are related.

https://juliasilge.com/blog/billboard-100/

nguyenlovesrpy commented 3 years ago

Great tutorial, But I wonder that Could we interpret the result of PCA more? I saw it just describes the relationship of predictors in a PCA.

jlmelville commented 3 years ago

For anyone interested in UMAP's limitations as discussed in the linked twitter thread, I think Dmitry Kobak's contrary view to Lior Pachter on this matter deserves a lot of attention.

(Big old disclaimer: I wrote the 'uwot' package that 'embed' uses for its UMAP implementation and lucked into being a co-author on the UMAP paper so I am not exactly an impartial observer on this one!)

juliasilge commented 3 years ago

That is great @jlmelville; thanks so much for sharing this, and also for your work on uwot. 🙌

laresbernardo commented 3 years ago

Great post as always Julia. Thanks for sharing. Also wanted to share my approach on understanding correlations on exploratory analysis which I find way easier to interpret, detect and understand. Using lares::corr_var and same variables used in this post, you'd get something like this. https://i.ibb.co/ZHDqGRp/Screen-Shot-2021-09-20-at-18-57-54.png

Emily-Zh-bio commented 2 years ago

Hi Julia, thank you so much for your work, I'm learning so much!! I have a basic question. I am analyzing my data with PLS using this tutorial, but I can't figure out how to retrieve the information on how much variation is explained by the components. Is it possible using the package?

juliasilge commented 2 years ago

@Emily-Zh-bio We have this implemented for PCA via the tidy() method; you can check out the code here. We don't have that implemented for PLS, though. If you'd like, you could open an issue on the recipes repo about this as a new feature. I do believe the info is in there, if you dig around, something like prepped$steps[[your_step_number]]$res$sd so it should be doable!

tamcdevittbit commented 2 years ago

FYI - to get the PLS graphs to work correctly, I needed to add mixOmics. Otherwise, the chart came out looking like a correlation plot by individual features (not PLS1, PLS2...). code used:

if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("mixOmics")

Cheers!

LDSTATXPERT commented 2 years ago

Thanks once again for this clear and useful usecase Julia! Do you think UMAP could be applicable when dealing with categorical data? So to summarize my issue, it's about using MCA rather than PCA with the same objective of reducing the dimensionality.

juliasilge commented 2 years ago

@LDSTATXPERT I don't believe that UMAP handles categorical data natively, but you could try creating dummy/indicator variables and see how that goes.