juliasilge / juliasilge.com

My blog, built with blogdown and Hugo :link:

https://juliasilge.com/

40 stars 27 forks source link

Find high FREX and high lift words for #TidyTuesday Stranger Things dialogue | Julia Silge #77

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Find high FREX and high lift words for #TidyTuesday Stranger Things dialogue | Julia Silge

A data science blog

https://juliasilge.com/blog/stranger-things/

martinsykora commented 1 year ago

As a fan of stranger things, I really enjoyed this... very nice blog post and really handy to be able to now tidyup the FREX and Lift measures with tidytext - great work.

m-olaide commented 1 year ago

This is another amazing presentation as usual. Thanks for your efforts. I have a couple of questions:

As shown, FREX and LIFT returns different words for each topics. Which of them will you recommend for practical applications?
You mentioned that it's not advisable to "remove stop words before building topic models". However, on the referred link for stm::estimateEffect(), you removed stopwords before building the topic models for that case study. Please advice on the best approach - to remove or not to remove stopwords before building topic models!

Thanks

juliasilge commented 1 year ago

@m-olaide Thanks for the great questions!

I have found both FREX and lift words to help people understand what a topic is about; I often would report both. If you want to see which would be more useful in your specific situation, I recommend reading the stm vignette and especially the references in there for how FREX and lift are designed and used.
For the best quality topics, you typically don't want to remove stop words, as explained in the Schofield & Mimno paper I linked in this post. Sometimes I will still remove them to make a quick-and-dirty topic model that doesn't include those super common words that are used in many or all topics.

Kenjd commented 1 year ago

Very thankful for all you share, Julia. Would you have an idea why this error occurs when trying to run the topic_model for "frex"? I know it worked originally in your video, but now I get this error when I run the code, and I can't track it down. Any thoughts are appreciated. Thanks so much.

Error in match.arg(matrix) : 'arg' should be one of “beta”, “gamma”, “theta”

Kenjd commented 1 year ago

Sorry, It's the "Stranger Things", Tidy Tuesday entry

juliasilge commented 1 year ago

@Kenjd Hmmmm, it's hard to say here because there aren't a lot of details about where you are getting that error. Can you create a reprex (a minimal reproducible example) for this? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page.

Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of analysis questions. Thanks! 🙌