juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:
https://juliasilge.com/
41 stars 27 forks source link

PCA and the #TidyTuesday best hip hop songs ever | Julia Silge #38

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

PCA and the #TidyTuesday best hip hop songs ever | Julia Silge

Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m exploring a different part of the tidymodels framework; I’m showing how to implement principal component analysis via recipes with this week’s #TidyTuesday dataset on the best hip hop songs of all time as determinded by a BBC poll of music critics.

https://juliasilge.com/blog/best-hip-hop/

gabbypaola commented 2 years ago

Thank you so much for sharing your work; I am learning a lot from you! Just wanted to note that the link to the screencast isn't working :(, but was able to find the link on YouTube: https://www.youtube.com/watch?v=OvgzIx5mDNM for anyone looking for it :D.

juliasilge commented 2 years ago

@gabbypaola Hmmm, that's strange. The video looks fine in the blog post to me. Can you tell me if there is anything unusual about your browser, OS, any firewalls/blocker, or similar? Do you see the embedded videos in the rest of my blog posts?

gabbypaola commented 2 years ago

Hi Julia, thanks for checking this out! I was able to get the video to work on this page. Looks like it was something on my browser settings.

sf210 commented 2 years ago

ranking_prep has 12 principal components, but juice(ranking_prep) only has 5. Does juice drop the less significant PC's based on a default threshold? Is there a way to change that threshold in juice(), or bake()?

juliasilge commented 2 years ago

@sf210 I don't believe that ranking_prep has 12 principal components; it has 12 predictors that are used in PCA. I used the default num_comp = 5 in the PCA extraction. If you would like more, you should change that argument.

alejandrohagan commented 2 years ago

First big thank you for all your videos! I am learning so much and it is helping me. You put "points "as the outcome variable in recipe() step whereas in your other PCA videos (UN voting, cocktail recipes) you did not (also chapter 16 of your TMWR book you also put "class" as a outcome variable). It is as simple to say as when you don't put a outcome, it is "unsupervised" whereas if you put an outcome (eg. left of the ~) it is now "supervised"? If my understanding is correct, are there any sources to understand how "supervised" PCA compares to "unsupervised" PCA? Thanks!

juliasilge commented 2 years ago

@alejandrohagan The actual PCA algorithm is always unsupervised and does not use the info from the outcome. When you use different kinds of formulas in a recipe, like points ~ . compared to ~ ., this is only about how the recipe understands the roles of the variables. When you use ~ ., the recipe treats all variables as predictors, with no outcomes at all.

If you are interested in an actual supervised dimensionality reduction approach, check out Ch 16 of our book.

RaymondBalise commented 1 year ago

If you have not use the spotifyr package, you will need to do a few steps to replicate these results.

  1. Install the spotifyr package from github (it is not on CRAN currently):
    devtools::install_github('charlie86/spotifyr')
  2. Make a spotify developer account. You will be prompted to create an account if you go here.
  3. Follow the instructions to get the API details the package needs. They are here.
  4. Follow the instructions for setting the API details. They are here.

Do what Julia does...

acarpignani commented 10 months ago

Hi Julia, thank you ever so much for your videos. I am following along your videos and learning so much. This one connecting to spotify is an extensive source of information, I am deeply grateful for this. You managed to teach me how to use maps, which I though it was an impossible task. I've been encountering a problem, though: if I try to do tidy(ranking_prep) it returns the following error:

Error in UseMethod("tidy") : 
  no applicable method for 'tidy' applied to an object of class "recipe"

Would you be able to explain me why?

juliasilge commented 10 months ago

@acarpignani Oh, that does seem pretty strange. You should definitely be able to tidy() a recipe object. Are you able to run the examples in those docs?

acarpignani commented 10 months ago

@juliasilge thank you ever so much for replying. For some reason, if I put library(spotifyr) before library(tidymodels) it works perfectly as usual, but if I only follow your step, it gives me the error I mentioned.

juliasilge commented 10 months ago

@acarpignani Ah, that smells of a namespace conflict! You may want to read up on the conflicted package, and specifically consider using tidymodels_prefer().

acarpignani commented 10 months ago

@juliasilge, thank you so much. Will certainly do it. Thank you for the advice, and thank you so very much for the videos you have made: I am following along every single one of them, and they are so informative and useful. I have learnt so much from you.