Closed kangutsa closed 2 years ago
Unfortunately I cannot yet reproduce the problem you are experiencing. Can you create a reprex (a minimal reproducible example) for this? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Thanks! 🙌
Here is an example that shows this code working:
library(tidyverse)
library(tidytext)
library(janeaustenr)
library(stm)
#> stm v1.3.6 successfully loaded. See ?stm for help.
#> Papers, resources, and other materials at structuraltopicmodel.com
books <- austen_books() %>%
group_by(book) %>%
mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
ungroup() %>%
filter(chapter > 0) %>%
unite(document, book, chapter, remove = FALSE)
austen_sparse <- books %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
count(document, word) %>%
cast_sparse(document, word, n)
#> Joining, by = "word"
topic_model <- stm(
austen_sparse,
K = 4,
init.type = "Spectral",
verbose = FALSE
)
summary(topic_model)
#> A topic model with 4 topics, 269 documents and a 13908 word dictionary.
#> Topic 1 Top Words:
#> Highest Prob: elizabeth, darcy, miss, bennet, lady, jane, bingley
#> FREX: darcy, bennet, bingley, wickham, collins, lydia, lizzy
#> Lift: baronetage, condescended, presumptive, gowland, finances, landlady, creditors
#> Score: darcy, bennet, bingley, elizabeth, wickham, collins, jane
#> Topic 2 Top Words:
#> Highest Prob: emma, miss, harriet, weston, knightley, elton, jane
#> FREX: weston, knightley, elton, woodhouse, fairfax, churchill, hartfield
#> Lift: martin, goddard's, bangs, broadway, brunswick, cleverer, curtseys
#> Score: emma, weston, knightley, elton, woodhouse, fairfax, churchill
#> Topic 3 Top Words:
#> Highest Prob: catherine, anne, captain, miss, time, tilney, wentworth
#> FREX: tilney, thorpe, morland, allen, eleanor, henrietta, benwick
#> Lift: ship, plunge, curl, heroic, castle, edifice, france
#> Score: tilney, catherine, thorpe, morland, allen, wentworth, isabella
#> Topic 4 Top Words:
#> Highest Prob: fanny, elinor, miss, time, crawford, sir, marianne
#> FREX: elinor, crawford, marianne, edmund, thomas, bertram, dashwood
#> Lift: distract, nanny, heath, knoll, spunging, terrace, admirals
#> Score: elinor, marianne, fanny, crawford, edmund, thomas, dashwood
chapters <- books %>%
group_by(document) %>%
summarize(text = str_c(text, collapse = " ")) %>%
ungroup() %>%
inner_join(books %>%
distinct(document, book))
#> Joining, by = "document"
chapters
#> # A tibble: 269 × 3
#> document text book
#> <chr> <chr> <fct>
#> 1 Emma_1 "CHAPTER I Emma Woodhouse, handsome, clever, and rich, with… Emma
#> 2 Emma_10 "CHAPTER X Though now the middle of December, there had yet… Emma
#> 3 Emma_11 "CHAPTER XI Mr. Elton must now be left to himself. It was n… Emma
#> 4 Emma_12 "CHAPTER XII Mr. Knightley was to dine with them--rather ag… Emma
#> 5 Emma_13 "CHAPTER XIII There could hardly be a happier creature in t… Emma
#> 6 Emma_14 "CHAPTER XIV Some change of countenance was necessary for e… Emma
#> 7 Emma_15 "CHAPTER XV Mr. Woodhouse was soon ready for his tea; and w… Emma
#> 8 Emma_16 "CHAPTER XVI The hair was curled, and the maid sent away, a… Emma
#> 9 Emma_17 "CHAPTER XVII Mr. and Mrs. John Knightley were not detained… Emma
#> 10 Emma_18 "CHAPTER XVIII Mr. Frank Churchill did not come. When the t… Emma
#> # … with 259 more rows
effects <- estimateEffect(1:3 ~ book, topic_model, chapters)
summary(effects)
#>
#> Call:
#> estimateEffect(formula = 1:3 ~ book, stmobj = topic_model, metadata = chapters)
#>
#>
#> Topic 1:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.031673 0.027522 1.151 0.251
#> bookPride & Prejudice 0.906982 0.035294 25.698 < 2e-16 ***
#> bookMansfield Park -0.018987 0.036336 -0.523 0.602
#> bookEmma -0.003074 0.036239 -0.085 0.932
#> bookNorthanger Abbey -0.001549 0.040821 -0.038 0.970
#> bookPersuasion 0.291445 0.063176 4.613 6.2e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Topic 2:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.858e-02 1.673e-02 1.111 0.268
#> bookPride & Prejudice 1.895e-03 2.440e-02 0.078 0.938
#> bookMansfield Park -4.086e-05 2.308e-02 -0.002 0.999
#> bookEmma 9.087e-01 3.211e-02 28.302 <2e-16 ***
#> bookNorthanger Abbey 1.953e-03 2.895e-02 0.067 0.946
#> bookPersuasion 2.479e-03 3.041e-02 0.081 0.935
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Topic 3:
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.023527 0.026388 0.892 0.373
#> bookPride & Prejudice -0.001951 0.036241 -0.054 0.957
#> bookMansfield Park 0.033284 0.042596 0.781 0.435
#> bookEmma -0.003720 0.034816 -0.107 0.915
#> bookNorthanger Abbey 0.903470 0.045303 19.943 <2e-16 ***
#> bookPersuasion 0.612319 0.059020 10.375 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tidy(effects)
#> # A tibble: 18 × 6
#> topic term estimate std.error statistic p.value
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 (Intercept) 0.0314 0.0275 1.14 2.54e- 1
#> 2 1 bookPride & Prejudice 0.908 0.0353 25.7 1.00e-73
#> 3 1 bookMansfield Park -0.0194 0.0368 -0.526 5.99e- 1
#> 4 1 bookEmma -0.00252 0.0365 -0.0690 9.45e- 1
#> 5 1 bookNorthanger Abbey -0.00134 0.0403 -0.0331 9.74e- 1
#> 6 1 bookPersuasion 0.292 0.0628 4.65 5.27e- 6
#> 7 2 (Intercept) 0.0186 0.0165 1.13 2.60e- 1
#> 8 2 bookPride & Prejudice 0.00214 0.0241 0.0889 9.29e- 1
#> 9 2 bookMansfield Park 0.000182 0.0228 0.00796 9.94e- 1
#> 10 2 bookEmma 0.909 0.0317 28.7 5.66e-83
#> 11 2 bookNorthanger Abbey 0.00217 0.0286 0.0757 9.40e- 1
#> 12 2 bookPersuasion 0.00269 0.0304 0.0885 9.30e- 1
#> 13 3 (Intercept) 0.0236 0.0262 0.898 3.70e- 1
#> 14 3 bookPride & Prejudice -0.00201 0.0360 -0.0558 9.56e- 1
#> 15 3 bookMansfield Park 0.0325 0.0424 0.767 4.44e- 1
#> 16 3 bookEmma -0.00331 0.0349 -0.0948 9.25e- 1
#> 17 3 bookNorthanger Abbey 0.903 0.0450 20.1 5.71e-55
#> 18 3 bookPersuasion 0.612 0.0591 10.3 2.92e-21
Created on 2022-07-21 by the reprex package (v2.0.1)
Thank you so much for your reply! Below is a reprex. I also attached the data file. I wanted to do topic modeling on the Ukraine war news. My goal is to examine if news publishers predict some topics. Hope I can hear from you.
library(tidyverse) library(reprex) news <-read_csv('/Users/Seok1/Desktop/mining/ukraine1.csv')
spec()
to retrieve the full column specification for this data.show_col_types = FALSE
to quiet this message.news %>% distinct(TITLE)
news %>% distinct(TITLE, DESCRIPTION)
library(tidytext)
tidy_uk <- news %>% unnest_tokens(word, DESCRIPTION) %>% anti_join(get_stopwords())
tidy_uk %>% count(word, sort = TRUE)
tidy_uk %>% count(TITLE, word, sort = TRUE)
news_sparse <- tidy_uk %>% count(TITLE, word) %>% cast_sparse(TITLE, word, n)
library(stm)
topic_model <- stm(news_sparse, K = 4)
summary(topic_model)
word_topics <- tidy(topic_model, matrix = "beta") word_topics
word_topics %>% group_by(topic) %>% slice_max(beta, n = 10) %>% ungroup() %>% mutate(topic = paste("Topic", topic)) %>% ggplot(aes(beta, reorder_within(term, beta, topic), fill = topic)) + geom_col(show.legend = FALSE) + facet_wrap(vars(topic), scales = "free_y") + scale_y_reordered() + labs(x = expression(beta), y = NULL) [wOvDoAl6GA7bgAAAABJRU5ErkJggg==]
title_topics <- tidy(topic_model, matrix = "gamma", document_names = rownames(news_sparse)) title_topics
title_topics %>% mutate(document = fct_reorder(document, gamma), topic = factor(topic)) %>% ggplot(aes(gamma, topic, fill = topic)) + geom_col(show.legend = FALSE) + facet_wrap(vars(document), ncol = 4) + labs(x = expression(gamma), y = "Topic") [AdrHyz7+dU7dAAAAAElFTkSuQmCC]
effect <- estimateEffect( 1:4 ~ PUBLISHER, topic_model, tidy_uk %>% distinct (TITLE, PUBLISHER) %>% arrange(TITLE))
summary(effect)
From: Julia Silge @.> Date: Thursday, July 21, 2022 at 9:50 PM To: juliasilge/tidytext @.> Cc: Seok Kang @.>, Author @.> Subject: [EXTERNAL] Re: [juliasilge/tidytext] Error in qr.lm(thetasims[, k], qx) (Issue #217) EXTERNAL EMAIL This email originated outside of The University of Texas at San Antonio. Please exercise caution when clicking on links or opening attachments.
Unfortunately I cannot yet reproduce the problem you are experiencing. Can you create a reprexhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freprex.tidyverse.org%2F&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CkTC%2FXNwxdRfOJOHVXwRBg2ohyupiymyKD0jZQZJi7s%3D&reserved=0 (a minimal reproducible example) for this? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org helphttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.tidyverse.org%2Fhelp%2F&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mYO1kV%2FInvpc3%2Ff5VskVX%2BDg%2BiLC8KR01G2c9Sd8kRU%3D&reserved=0 page. Thanks! 🙌
Here is an example that shows this code working:
library(tidyverse)
library(tidytext)
library(janeaustenr)
library(stm)
books <- austen_books() %>%
group_by(book) %>%
mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
ungroup() %>%
filter(chapter > 0) %>%
unite(document, book, chapter, remove = FALSE)
austen_sparse <- books %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
count(document, word) %>%
cast_sparse(document, word, n)
topic_model <- stm(
austen_sparse,
K = 4,
init.type = "Spectral",
verbose = FALSE
)
summary(topic_model)
chapters <- books %>%
group_by(document) %>%
summarize(text = str_c(text, collapse = " ")) %>%
ungroup() %>%
inner_join(books %>%
distinct(document, book))
chapters
effects <- estimateEffect(1:3 ~ book, topic_model, chapters)
summary(effects)
tidy(effects)
Created on 2022-07-21 by the reprex packagehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freprex.tidyverse.org%2F&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CkTC%2FXNwxdRfOJOHVXwRBg2ohyupiymyKD0jZQZJi7s%3D&reserved=0 (v2.0.1)
— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjuliasilge%2Ftidytext%2Fissues%2F217%23issuecomment-1192125279&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ij8Mf%2BV3sfki5uyqqofd3Nhval%2Fqe6urU8J51mkdz94%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAH3KBUFAXXBAIKLAYIZZTY3VVIEAHANCNFSM54FLSVVQ&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=l%2BUgYTbeczcQAN1rM9WwT7z6gx9orHRQCZ%2BjtbO9E9s%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
Could you update your example to use some more easily accessible data (I don't have your CSV file) and the reprex package? Using reprex makes it easier to see both the input and output, and for us to re-run the code in a local session. Your output should look like mine if you paste it in; here are two articles if you are having trouble:
Thanks! 🙌
Thank you. Below are codes using the reprex package. The dataset link is https://utsacloud-my.sharepoint.com/:x:/g/personal/seok_kang_utsa_edu/EbTEWqpOfytGsoEkzyGJdiMBewtcX58OGTxHhj11wQBx5w?e=1sBpQZ
Any advice will be appreciated. I wanted to look at if publishers predict topics.
library(tidyverse)
library(reprex)
news <-read_csv('/Users/Seok1/Desktop/mining/ukraine2.csv')
#> Rows: 24 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (10): TITLE, ARTICLE LINK, PUBLISHED DATE (GMT), AUTHOR, PUBLISHER, COUN...
#> lgl (1): VIDEO URL
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
news %>% distinct(TITLE)
#> # A tibble: 22 × 1
#> TITLE
#> <chr>
#> 1 "\"German chancellor says Putin is ready to wage Ukraine war for \"a long ti…
#> 2 "\"How could Putin’s Ukraine war trigger famine more than 8000km away?\""
#> 3 "\"Russia-Ukraine war: What happened today (June 30)\""
#> 4 "\"Ukraine war: All they will inherit is rubble - relentless Russian bombard…
#> 5 "\"Ukraine war: Klitschko brothers plead to Nato leaders\""
#> 6 "\"Russia-Ukraine war: Buhari calls for increased gas partnership with Portu…
#> 7 "\"Ukraine war: five things you need to know about the confict this Thursday…
#> 8 "\"Ukraine war: New 'iron curtain' descending, warns Sergei Lavrov\""
#> 9 "\"Should Russian culture be 'cancelled' over Ukraine war?\""
#> 10 "\"Would Ukraine war not have happened if Putin was a woman?\""
#> # … with 12 more rows
news %>% distinct(TITLE, DESCRIPTION)
#> # A tibble: 22 × 2
#> TITLE DESCRIPTION
#> <chr> <chr>
#> 1 "\"German chancellor says Putin is ready to wage Ukraine war for… "CBS News'…
#> 2 "\"How could Putin’s Ukraine war trigger famine more than 8000km… "Block a p…
#> 3 "\"Russia-Ukraine war: What happened today (June 30)\"" "A roundup…
#> 4 "\"Ukraine war: All they will inherit is rubble - relentless Rus… <NA>
#> 5 "\"Ukraine war: Klitschko brothers plead to Nato leaders\"" "He's a ma…
#> 6 "\"Russia-Ukraine war: Buhari calls for increased gas partnershi… "The Presi…
#> 7 "\"Ukraine war: five things you need to know about the confict t… "Sanctions…
#> 8 "\"Ukraine war: New 'iron curtain' descending, warns Sergei Lavr… <NA>
#> 9 "\"Should Russian culture be 'cancelled' over Ukraine war?\"" "There are…
#> 10 "\"Would Ukraine war not have happened if Putin was a woman?\"" "“IF Putin…
#> # … with 12 more rows
library(tidytext)
tidy_uk <-
news %>%
unnest_tokens(word, DESCRIPTION) %>%
anti_join(get_stopwords())
#> Joining, by = "word"
tidy_uk %>% count(word, sort = TRUE)
#> # A tibble: 281 × 2
#> word n
#> <chr> <int>
#> 1 ukraine 13
#> 2 nato 7
#> 3 russian 6
#> 4 war 6
#> 5 military 5
#> 6 said 5
#> 7 <NA> 5
#> 8 eastern 4
#> 9 finland 4
#> 10 invasion 4
#> # … with 271 more rows
tidy_uk %>% count(TITLE, word, sort = TRUE)
#> # A tibble: 359 × 3
#> TITLE word n
#> <chr> <chr> <int>
#> 1 "\"Russia-Ukraine war: Nato says Moscow is biggest ‘direct threa… east… 3
#> 2 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… arms 2
#> 3 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… nato 2
#> 4 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… prev… 2
#> 5 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… race 2
#> 6 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… ukra… 2
#> 7 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… war 2
#> 8 "\"German porcelain maker faces fragile future due to Ukraine wa… <NA> 2
#> 9 "\"How could Putin’s Ukraine war trigger famine more than 8000km… anot… 2
#> 10 "\"How could Putin’s Ukraine war trigger famine more than 8000km… bears 2
#> # … with 349 more rows
## train topic model
news_sparse <-
tidy_uk %>%
count(TITLE, word) %>%
cast_sparse(TITLE, word, n)
library(stm)
#> stm v1.3.6 successfully loaded. See ?stm for help.
#> Papers, resources, and other materials at structuraltopicmodel.com
topic_model <- stm(news_sparse, K = 4)
#> Beginning Spectral Initialization
#> Calculating the gram matrix...
#> Finding anchor words...
#> ....
#> Recovering initialization...
#> ..
#> Initialization complete.
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 1 (approx. per word bound = -5.167)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 2 (approx. per word bound = -4.813, relative change = 6.857e-02)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 3 (approx. per word bound = -4.692, relative change = 2.508e-02)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 4 (approx. per word bound = -4.640, relative change = 1.101e-02)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 5 (approx. per word bound = -4.610, relative change = 6.432e-03)
#> Topic 1: NA, nato, war, ukraine, arms
#> Topic 2: country, madrid, military, one, leaders
#> Topic 3: ukraine, invasion, said, putin, russian
#> Topic 4: ukraine, russian, russia, eastern, military
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 6 (approx. per word bound = -4.596, relative change = 3.178e-03)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 7 (approx. per word bound = -4.586, relative change = 2.204e-03)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 8 (approx. per word bound = -4.581, relative change = 9.745e-04)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 9 (approx. per word bound = -4.579, relative change = 5.114e-04)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 10 (approx. per word bound = -4.577, relative change = 4.484e-04)
#> Topic 1: NA, nato, war, ukraine, arms
#> Topic 2: country, madrid, one, military, nato
#> Topic 3: ukraine, invasion, russian, coverage, depth
#> Topic 4: ukraine, russia, russian, eastern, finland
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 11 (approx. per word bound = -4.575, relative change = 4.195e-04)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 12 (approx. per word bound = -4.573, relative change = 4.057e-04)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 13 (approx. per word bound = -4.568, relative change = 1.020e-03)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 14 (approx. per word bound = -4.560, relative change = 1.810e-03)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 15 (approx. per word bound = -4.554, relative change = 1.371e-03)
#> Topic 1: NA, nato, ukraine, war, arms
#> Topic 2: country, one, military, crisis, supply
#> Topic 3: ukraine, invasion, russian, coverage, depth
#> Topic 4: ukraine, russia, russian, eastern, finland
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 16 (approx. per word bound = -4.553, relative change = 1.646e-04)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Completing Iteration 17 (approx. per word bound = -4.553, relative change = 2.456e-05)
#> ......................
#> Completed E-Step (0 seconds).
#> Completed M-Step.
#> Model Converged
summary(topic_model)
#> A topic model with 4 topics, 22 documents and a 281 word dictionary.
#> Topic 1 Top Words:
#> Highest Prob: NA, nato, ukraine, war, arms, prevent, race
#> FREX: NA, arms, prevent, race, war, thursday, nato
#> Lift: 30, alongside, appeared, ardern, arms, becoming, called
#> Score: NA, arms, prevent, race, 30, alongside, appeared
#> Topic 2 Top Words:
#> Highest Prob: country, one, military, crisis, supply, madrid, another
#> FREX: country, one, another, bears, block, brunt, can
#> Lift: belarus, campaign, covert, posed, reports, barely, economic
#> Score: belarus, country, another, bears, block, brunt, can
#> Topic 3 Top Words:
#> Highest Prob: ukraine, invasion, russian, war, coverage, depth, developments
#> FREX: invasion, coverage, depth, developments, key, latest, roundup
#> Lift: 18, airstrike, battled, centre, contested, killed, last
#> Score: coverage, depth, developments, key, latest, roundup, russia's
#> Topic 4 Top Words:
#> Highest Prob: ukraine, russia, russian, eastern, finland, sweden, military
#> FREX: russia, finland, sweden, eastern, russian, major, frontline
#> Lift: assistant, backdrop, become, buhari, especially, europe’s, garba
#> Score: arguments, frontline, offensive, russian, russia, major, know
## Explore topic model results
word_topics <- tidy(topic_model, matrix = "beta")
word_topics
#> # A tibble: 1,124 × 3
#> topic term beta
#> <int> <chr> <dbl>
#> 1 1 30 1.98e- 2
#> 2 2 30 2.59e-46
#> 3 3 30 1.49e-26
#> 4 4 30 1.04e-45
#> 5 1 alongside 1.98e- 2
#> 6 2 alongside 2.59e-46
#> 7 3 alongside 1.49e-26
#> 8 4 alongside 1.04e-45
#> 9 1 appeared 1.98e- 2
#> 10 2 appeared 2.59e-46
#> # … with 1,114 more rows
word_topics %>%
group_by(topic) %>%
slice_max(beta, n = 5) %>%
ungroup() %>%
mutate(topic = paste("Topic", topic)) %>%
ggplot(aes(beta, reorder_within(term, beta, topic), fill = topic)) +
geom_col(show.legend = FALSE) +
facet_wrap(vars(topic), scales = "free_y") +
scale_y_reordered() +
labs(x = expression(beta), y = NULL)
title_topics <- tidy(topic_model, matrix = "gamma",
document_names = rownames(news_sparse))
title_topics
#> # A tibble: 88 × 3
#> document topic gamma
#> <chr> <int> <dbl>
#> 1 "\"Ardern calls on NATO to prevent Ukraine war from triggering… 1 0.953
#> 2 "\"Belarus posed for 'covert' military campaign in chilling Uk… 1 0.0195
#> 3 "\"Fate of Ukraine war will be decided on the battlefield, not… 1 0.00906
#> 4 "\"German chancellor says Putin is ready to wage Ukraine war f… 1 0.00966
#> 5 "\"German porcelain maker faces fragile future due to Ukraine … 1 0.744
#> 6 "\"How could Putin’s Ukraine war trigger famine more than 8000… 1 0.00591
#> 7 "\"NATO and the Ukraine war: It took 30 years for Russia and t… 1 0.0361
#> 8 "\"Russia-Ukraine war: Buhari calls for increased gas partners… 1 0.00449
#> 9 "\"Russia-Ukraine war: Moscow intensifies attacks in Ukraine a… 1 0.0268
#> 10 "\"Russia-Ukraine war: Nato says Moscow is biggest ‘direct thr… 1 0.00229
#> # … with 78 more rows
title_topics %>%
mutate(document = fct_reorder(document, gamma),
topic = factor(topic)) %>%
ggplot(aes(gamma, topic, fill = topic)) +
geom_col(show.legend = FALSE) +
facet_wrap(vars(document), ncol = 4) +
labs(x = expression(gamma), y = "Topic")
impact <-
estimateEffect(
1:4 ~ CATEGORY,
topic_model,
tidy_uk %>%
distinct (TITLE, CATEGORY)
%>% arrange(TITLE))
#> Error in qr.lm(thetasims[, k], qx): number of covariate observations does not match number of docs
summary(impact)
#> Error in summary(impact): object 'impact' not found
Created on 2022-07-22 by the reprex package (v2.0.1)
Unfortunately I don't see a way to download that data @kangutsa.
Can you read this article on how to create a reprex? Especially notice the first "main requirement":
Use the smallest, simplest, most built-in data possible.
Let me know if you have further questions!
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Hello Julia, I am learning a lot from your book and videos. Thank you. I am conducting a text mining analysis with topic modeling. While following your instructions, I have come up with the following error message. Could you advise on this? Thank you again.
Error in qr.lm(thetasims[, k], qx) : number of covariate observations does not match number of docs
The entire script is below.
library(tidyverse) news <-read_csv('/Users/Seok1/Desktop/mining/ukraine1.csv') news %>% distinct(TITLE) news %>% distinct(TITLE, DESCRIPTION)
library(tidytext)
tidy_news <- news %>% unnest_tokens(word, DESCRIPTION) %>% anti_join(get_stopwords())
tidy_news %>% count(word, sort = TRUE) tidy_news %>% count(TITLE, word, sort = TRUE)
train topic model
news_sparse <- tidy_news %>% count(TITLE, word) %>% cast_sparse(TITLE, word, n)
library(stm)
topic_model <- stm(news_sparse, K = 4)
summary(topic_model)
Explore topic model results
word_topics <- tidy(topic_model, matrix = "beta") word_topics
word_topics %>% group_by(topic) %>% slice_max(beta, n = 10) %>% ungroup() %>% mutate(topic = paste("Topic", topic)) %>% ggplot(aes(beta, reorder_within(term, beta, topic), fill = topic)) + geom_col(show.legend = FALSE) + facet_wrap(vars(topic), scales = "free_y") + scale_y_reordered() + labs(x = expression(beta), y = NULL)
title_topics <- tidy(topic_model, matrix = "gamma", document_names = rownames(news_sparse))
title_topics
title_topics %>% mutate(document = fct_reorder(document, gamma), topic = factor(topic)) %>% ggplot(aes(gamma, topic, fill = topic)) + geom_col(show.legend = FALSE) + facet_wrap(vars(document), ncol = 4) + labs(x = expression(gamma), y = "Topic")
effects <- estimateEffect( 1:4 ~ CATEGORY, topic_model, tidy_news %>% distinct (TITLE, CATEGORY) %>% arrange(TITLE) )
summary(effects) Error in qr.lm(thetasims[, k], qx) : number of covariate observations does not match number of docs