Open utterances-bot opened 2 years ago
Hi Julia, thanks for this post! I'm an undergraduate student doing some self learning so forgive me if I've missed something obvious.
At the end where we've done regression, it appears to only be comparing the albums Spice and Spiceworld. I can't see comparisons for the album Forever, which is included in our original dataset.
Am I missing something?
@cr1bt When you fit a linear regression with a factor predictor, you get out coefficients for that predictor that are with respect to a reference level. It is most commonly the first level alphabetically, unless you do something special to the variable ahead of time. Check out this section of our book and this SO answer for some more in-depth explanation.
Hi Julia, this is more a general question. I was trying to search for another blog post for one of your analysis but it seems the search option is no longer available after the redesign. My apologies if i missed it. I tried it on desktop as well.
Thanks.
@gunnergalactico I did recently move my blog away from the Academic Hugo theme, which had support for a search bar, to the Apero Hugo theme, which does not yet. I'll look into how to support that! In the meantime, you can search a single site like mine from Google, like:
site:juliasilge.com rpart
Hi Julia, I was able to reproduce everything following your examples and wanted to try to produce similar analyses with a database of Taylor Swift's song lyrics for fun. I can reproduce everything until the estimateEffect function, at which point I get an error stating
Error in UseMethod("asSTMCorpus") : no applicable method for 'asSTMCorpus' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
I assume referring to the line "tidy_lyrics %>% distinct(Title, Album) %>% arrange(Title)" (different column names for this T Swift dataset) as the 3rd argument in the estimateEffect function.
The thing is, in your example, the result of tidy_lyrics %>% distinct(song_name, album_name) %>% arrange(song_name) is of class c('tbl_df', 'tbl', 'data.frame'), isn't it? I'm not sure what's different.
I know this isn't anywhere near a reprex but thought maybe you might have an idea what kind of thing would cause the estimateEffect function to produce that error.
@JoshuaSteele Hmmmm, nothing comes to mind immediately for this. I'd look carefully at the arguments you are passing in to estimateEffect()
and make sure they don't have any problems/unexpected characteristics.
Ah, well I'll keep trying my other debugging methods then. Thanks for the response!
I am so dumb. I was using the %>% operator instead of the assignment <- for the effects <- estimateEffect line. I realized it as I was combing through the estimateEffect parameters. One small typo.
@JoshuaSteele Typos strike again! ðŸ˜
Hi, Julia. I am a big fan of your blog. Thank you so much for your sharing. I has a question about the application of stm package. I have a news database covered a month among which has a policy shock. Is it possible to combine difference-in-differences (DID) and structural topic models?
@xinzhuohkust Yep, I believe so! If I understand it correctly, the typical way to model DID is to use an interaction term, and the stm package allows you to build a topic model with interaction terms. I recommend you check out the stm paper!
Thank you so much for your quick response!
Sorry for bothering you again.
I have fitted a topic model with 80 topics utilizing stm package. I am using document-topic distributions as outcome variables and run a regression using lm or plm function other than estimateEffect.
as_tibble(topic_model$theta) %>% set_names(nm = sprintf("topic%s", 1:80)) %>% add_column(covariates) %>% lm(topic75 ~ democracy + day + country_name, .) %>% summary()
The regression result is different from: estimateEffect(c(75) ~ democracy + day + country_name, topic_model, meta = covariates)
I was wondering if you could tell me whether I am doing the right thing?
Stay safe and be well!
@xinzhuohkust Those are two different models, and I think I would probably use estimateEffect()
in most situations, rather than specifying such models using lm()
. One significant difference is how estimateEffect()
incorporates the uncertainty (from the topic model) in the outcome. Check out the detailed documentation at ?estimateEffect
(especially the Details) and the info on estimateEffect()
in the stm vignette/paper.
Hi! Thank you so much Julia for your videos and tutorials. I am applying your tutorial to trace changes in topics over time in journal articles. I have five decades (more than 9 million tokens) and I am thinking in each decade as an equivalent to the "albums" and each document as an equivalent to the songs of the example. My model has a k of 25.
The point is, when I execute estimateEffect
I have this error: Error in qr.lm(thetasims[, k], qx) : number of covariate observations does not match number of docs
. My code is:
estimateEffect( 1:25 ~ decade, topic_model_corpus, total_corpus %>% distinct (id, decade)%>% arrange (id) )
I am using as metadata a tidy data frame with this structure:
decade id word
@gcm31 It's hard to know for sure without access to your data, but the error message "number of covariate observations does not match number of docs" indicates that what you are passing in as covariates doesn't have the same number of documents as what was in your model. Can you create a reprex (a minimal reproducible example) for this? The goal of a reprex is to make it easier for people to recreate your problem so that they can understand it and/or fix it. Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of modeling questions. Thanks! 🙌
Thank you so much for your answer, Julia! I think that I found the problem. When I did the sparse matrix, I filtered for tokens used more than 5 times. I will test if that is the problem, otherwise, I'll post the question in RStudio Community. Thanks!!
Topic modeling for #TidyTuesday Spice Girls lyrics | Julia Silge
Learn how to train, explore, and understand an unsupervised topic model for text data.
https://juliasilge.com/blog/spice-girls/