juliasilge / tidytext

Text mining using tidy tools :sparkles::page_facing_up::sparkles:
https://juliasilge.github.io/tidytext/
Other
1.18k stars 182 forks source link

Error in qr.lm(thetasims[, k], qx) #217

Closed kangutsa closed 2 years ago

kangutsa commented 2 years ago

Hello Julia, I am learning a lot from your book and videos. Thank you. I am conducting a text mining analysis with topic modeling. While following your instructions, I have come up with the following error message. Could you advise on this? Thank you again.

Error in qr.lm(thetasims[, k], qx) : number of covariate observations does not match number of docs

The entire script is below.

library(tidyverse) news <-read_csv('/Users/Seok1/Desktop/mining/ukraine1.csv') news %>% distinct(TITLE) news %>% distinct(TITLE, DESCRIPTION)

library(tidytext)

tidy_news <- news %>% unnest_tokens(word, DESCRIPTION) %>% anti_join(get_stopwords())

tidy_news %>% count(word, sort = TRUE) tidy_news %>% count(TITLE, word, sort = TRUE)

train topic model

news_sparse <- tidy_news %>% count(TITLE, word) %>% cast_sparse(TITLE, word, n)

library(stm)

topic_model <- stm(news_sparse, K = 4)

summary(topic_model)

Explore topic model results

word_topics <- tidy(topic_model, matrix = "beta") word_topics

word_topics %>% group_by(topic) %>% slice_max(beta, n = 10) %>% ungroup() %>% mutate(topic = paste("Topic", topic)) %>% ggplot(aes(beta, reorder_within(term, beta, topic), fill = topic)) + geom_col(show.legend = FALSE) + facet_wrap(vars(topic), scales = "free_y") + scale_y_reordered() + labs(x = expression(beta), y = NULL)

title_topics <- tidy(topic_model, matrix = "gamma", document_names = rownames(news_sparse))
title_topics

title_topics %>% mutate(document = fct_reorder(document, gamma), topic = factor(topic)) %>% ggplot(aes(gamma, topic, fill = topic)) + geom_col(show.legend = FALSE) + facet_wrap(vars(document), ncol = 4) + labs(x = expression(gamma), y = "Topic")

effects <- estimateEffect( 1:4 ~ CATEGORY, topic_model, tidy_news %>% distinct (TITLE, CATEGORY) %>% arrange(TITLE) )

summary(effects) Error in qr.lm(thetasims[, k], qx) : number of covariate observations does not match number of docs

juliasilge commented 2 years ago

Unfortunately I cannot yet reproduce the problem you are experiencing. Can you create a reprex (a minimal reproducible example) for this? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page. Thanks! 🙌

Here is an example that shows this code working:

library(tidyverse)
library(tidytext)
library(janeaustenr)
library(stm)
#> stm v1.3.6 successfully loaded. See ?stm for help. 
#>  Papers, resources, and other materials at structuraltopicmodel.com

books <- austen_books() %>%
    group_by(book) %>%
    mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
    ungroup() %>%
    filter(chapter > 0) %>%
    unite(document, book, chapter, remove = FALSE)

austen_sparse <- books %>%
    unnest_tokens(word, text) %>%
    anti_join(stop_words) %>%
    count(document, word) %>%
    cast_sparse(document, word, n)
#> Joining, by = "word"

topic_model <- stm(
    austen_sparse, 
    K = 4,
    init.type = "Spectral",
    verbose = FALSE
)

summary(topic_model)
#> A topic model with 4 topics, 269 documents and a 13908 word dictionary.
#> Topic 1 Top Words:
#>       Highest Prob: elizabeth, darcy, miss, bennet, lady, jane, bingley 
#>       FREX: darcy, bennet, bingley, wickham, collins, lydia, lizzy 
#>       Lift: baronetage, condescended, presumptive, gowland, finances, landlady, creditors 
#>       Score: darcy, bennet, bingley, elizabeth, wickham, collins, jane 
#> Topic 2 Top Words:
#>       Highest Prob: emma, miss, harriet, weston, knightley, elton, jane 
#>       FREX: weston, knightley, elton, woodhouse, fairfax, churchill, hartfield 
#>       Lift: martin, goddard's, bangs, broadway, brunswick, cleverer, curtseys 
#>       Score: emma, weston, knightley, elton, woodhouse, fairfax, churchill 
#> Topic 3 Top Words:
#>       Highest Prob: catherine, anne, captain, miss, time, tilney, wentworth 
#>       FREX: tilney, thorpe, morland, allen, eleanor, henrietta, benwick 
#>       Lift: ship, plunge, curl, heroic, castle, edifice, france 
#>       Score: tilney, catherine, thorpe, morland, allen, wentworth, isabella 
#> Topic 4 Top Words:
#>       Highest Prob: fanny, elinor, miss, time, crawford, sir, marianne 
#>       FREX: elinor, crawford, marianne, edmund, thomas, bertram, dashwood 
#>       Lift: distract, nanny, heath, knoll, spunging, terrace, admirals 
#>       Score: elinor, marianne, fanny, crawford, edmund, thomas, dashwood

chapters <- books %>%
    group_by(document) %>% 
    summarize(text = str_c(text, collapse = " ")) %>%
    ungroup() %>%
    inner_join(books %>%
                   distinct(document, book))
#> Joining, by = "document"

chapters
#> # A tibble: 269 × 3
#>    document text                                                           book 
#>    <chr>    <chr>                                                          <fct>
#>  1 Emma_1   "CHAPTER I   Emma Woodhouse, handsome, clever, and rich, with… Emma 
#>  2 Emma_10  "CHAPTER X   Though now the middle of December, there had yet… Emma 
#>  3 Emma_11  "CHAPTER XI   Mr. Elton must now be left to himself. It was n… Emma 
#>  4 Emma_12  "CHAPTER XII   Mr. Knightley was to dine with them--rather ag… Emma 
#>  5 Emma_13  "CHAPTER XIII   There could hardly be a happier creature in t… Emma 
#>  6 Emma_14  "CHAPTER XIV   Some change of countenance was necessary for e… Emma 
#>  7 Emma_15  "CHAPTER XV   Mr. Woodhouse was soon ready for his tea; and w… Emma 
#>  8 Emma_16  "CHAPTER XVI   The hair was curled, and the maid sent away, a… Emma 
#>  9 Emma_17  "CHAPTER XVII   Mr. and Mrs. John Knightley were not detained… Emma 
#> 10 Emma_18  "CHAPTER XVIII   Mr. Frank Churchill did not come. When the t… Emma 
#> # … with 259 more rows

effects <- estimateEffect(1:3 ~ book, topic_model, chapters)

summary(effects)
#> 
#> Call:
#> estimateEffect(formula = 1:3 ~ book, stmobj = topic_model, metadata = chapters)
#> 
#> 
#> Topic 1:
#> 
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            0.031673   0.027522   1.151    0.251    
#> bookPride & Prejudice  0.906982   0.035294  25.698  < 2e-16 ***
#> bookMansfield Park    -0.018987   0.036336  -0.523    0.602    
#> bookEmma              -0.003074   0.036239  -0.085    0.932    
#> bookNorthanger Abbey  -0.001549   0.040821  -0.038    0.970    
#> bookPersuasion         0.291445   0.063176   4.613  6.2e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Topic 2:
#> 
#> Coefficients:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            1.858e-02  1.673e-02   1.111    0.268    
#> bookPride & Prejudice  1.895e-03  2.440e-02   0.078    0.938    
#> bookMansfield Park    -4.086e-05  2.308e-02  -0.002    0.999    
#> bookEmma               9.087e-01  3.211e-02  28.302   <2e-16 ***
#> bookNorthanger Abbey   1.953e-03  2.895e-02   0.067    0.946    
#> bookPersuasion         2.479e-03  3.041e-02   0.081    0.935    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Topic 3:
#> 
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            0.023527   0.026388   0.892    0.373    
#> bookPride & Prejudice -0.001951   0.036241  -0.054    0.957    
#> bookMansfield Park     0.033284   0.042596   0.781    0.435    
#> bookEmma              -0.003720   0.034816  -0.107    0.915    
#> bookNorthanger Abbey   0.903470   0.045303  19.943   <2e-16 ***
#> bookPersuasion         0.612319   0.059020  10.375   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tidy(effects)
#> # A tibble: 18 × 6
#>    topic term                   estimate std.error statistic  p.value
#>    <int> <chr>                     <dbl>     <dbl>     <dbl>    <dbl>
#>  1     1 (Intercept)            0.0314      0.0275   1.14    2.54e- 1
#>  2     1 bookPride & Prejudice  0.908       0.0353  25.7     1.00e-73
#>  3     1 bookMansfield Park    -0.0194      0.0368  -0.526   5.99e- 1
#>  4     1 bookEmma              -0.00252     0.0365  -0.0690  9.45e- 1
#>  5     1 bookNorthanger Abbey  -0.00134     0.0403  -0.0331  9.74e- 1
#>  6     1 bookPersuasion         0.292       0.0628   4.65    5.27e- 6
#>  7     2 (Intercept)            0.0186      0.0165   1.13    2.60e- 1
#>  8     2 bookPride & Prejudice  0.00214     0.0241   0.0889  9.29e- 1
#>  9     2 bookMansfield Park     0.000182    0.0228   0.00796 9.94e- 1
#> 10     2 bookEmma               0.909       0.0317  28.7     5.66e-83
#> 11     2 bookNorthanger Abbey   0.00217     0.0286   0.0757  9.40e- 1
#> 12     2 bookPersuasion         0.00269     0.0304   0.0885  9.30e- 1
#> 13     3 (Intercept)            0.0236      0.0262   0.898   3.70e- 1
#> 14     3 bookPride & Prejudice -0.00201     0.0360  -0.0558  9.56e- 1
#> 15     3 bookMansfield Park     0.0325      0.0424   0.767   4.44e- 1
#> 16     3 bookEmma              -0.00331     0.0349  -0.0948  9.25e- 1
#> 17     3 bookNorthanger Abbey   0.903       0.0450  20.1     5.71e-55
#> 18     3 bookPersuasion         0.612       0.0591  10.3     2.92e-21

Created on 2022-07-21 by the reprex package (v2.0.1)

kangutsa commented 2 years ago

Thank you so much for your reply! Below is a reprex. I also attached the data file. I wanted to do topic modeling on the Ukraine war news. My goal is to examine if news publishers predict some topics. Hope I can hear from you.

library(tidyverse) library(reprex) news <-read_csv('/Users/Seok1/Desktop/mining/ukraine1.csv')

> Rows: 902 Columns: 11

> ── Column specification ────────────────────────────────────────────────────────

> Delimiter: ","

> chr (10): TITLE, ARTICLE LINK, PUBLISHED DATE (GMT), AUTHOR, PUBLISHER, COUN...

> lgl (1): VIDEO URL

>

> ℹ Use spec() to retrieve the full column specification for this data.

> ℹ Specify the column types or set show_col_types = FALSE to quiet this message.

news %>% distinct(TITLE)

> # A tibble: 759 × 1

> TITLE

>

> 1 "\"German chancellor says Putin is ready to wage Ukraine war for \"a long ti…

> 2 "\"How could Putin’s Ukraine war trigger famine more than 8000km away?\""

> 3 "\"Russia-Ukraine war: What happened today (June 30)\""

> 4 "\"Ukraine war: All they will inherit is rubble - relentless Russian bombard…

> 5 "\"Ukraine war: Klitschko brothers plead to Nato leaders\""

> 6 "\"Russia-Ukraine war: Buhari calls for increased gas partnership with Portu…

> 7 "\"Ukraine war: five things you need to know about the confict this Thursday…

> 8 "\"Ukraine war: New 'iron curtain' descending, warns Sergei Lavrov\""

> 9 "\"Ukraine war update: Russia withdraws from Snake Island, but gains in Donb…

> 10 "\"UKRAINE WAR - The Ukrainian woman taking in animals abandoned amid war\""

> # … with 749 more rows

news %>% distinct(TITLE, DESCRIPTION)

> # A tibble: 838 × 2

> TITLE DESCRIPTION

>

> 1 "\"German chancellor says Putin is ready to wage Ukraine war for… "CBS News'…

> 2 "\"How could Putin’s Ukraine war trigger famine more than 8000km… "Block a p…

> 3 "\"Russia-Ukraine war: What happened today (June 30)\"" "A roundup…

> 4 "\"Ukraine war: All they will inherit is rubble - relentless Rus…

> 5 "\"Ukraine war: Klitschko brothers plead to Nato leaders\"" "He's a ma…

> 6 "\"Russia-Ukraine war: Buhari calls for increased gas partnershi… "The Presi…

> 7 "\"Ukraine war: five things you need to know about the confict t… "Sanctions…

> 8 "\"Ukraine war: New 'iron curtain' descending, warns Sergei Lavr…

> 9 "\"Ukraine war update: Russia withdraws from Snake Island, but g…

> 10 "\"UKRAINE WAR - The Ukrainian woman taking in animals abandoned…

> # … with 828 more rows

library(tidytext)

tidy_uk <- news %>% unnest_tokens(word, DESCRIPTION) %>% anti_join(get_stopwords())

> Joining, by = "word"

tidy_uk %>% count(word, sort = TRUE)

> # A tibble: 5,236 × 2

> word n

>

> 1 ukraine 547

> 2 war 353

> 3 russia 257

> 4 russian 245

> 5 210

> 6 said 207

> 7 invasion 132

> 8 president 121

> 9 ukrainian 98

> 10 says 93

> # … with 5,226 more rows

tidy_uk %>% count(TITLE, word, sort = TRUE)

> # A tibble: 16,313 × 3

> TITLE word n

>

> 1 "\"OECD slashes global economic outlook on Russia-Ukraine war\"" econ… 30

> 2 "\"Biden: G-7 to ban Russian gold in response to Ukraine war\"" russ… 28

> 3 "\"Live updates on the Russia-Ukraine war: more fighting expecte… mr 24

> 4 "\"Biden: G-7 to ban Russian gold in response to Ukraine war\"" gold 18

> 5 "\"Biden: G-7 to ban Russian gold in response to Ukraine war\"" impo… 17

> 6 "\"Biden: G-7 to ban Russian gold in response to Ukraine war\"" biden 15

> 7 "\"Biden: G-7 to ban Russian gold in response to Ukraine war\"" said 15

> 8 "\"OECD slashes global economic outlook on Russia-Ukraine war\"" coop… 15

> 9 "\"OECD slashes global economic outlook on Russia-Ukraine war\"" cris… 15

> 10 "\"OECD slashes global economic outlook on Russia-Ukraine war\"" deve… 15

> # … with 16,303 more rows

train topic model

news_sparse <- tidy_uk %>% count(TITLE, word) %>% cast_sparse(TITLE, word, n)

library(stm)

> stm v1.3.6 successfully loaded. See ?stm for help.

> Papers, resources, and other materials at structuraltopicmodel.com

topic_model <- stm(news_sparse, K = 4)

> Beginning Spectral Initialization

> Calculating the gram matrix...

> Finding anchor words...

> ....

> Recovering initialization...

> ....................................................

> Initialization complete.

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 1 (approx. per word bound = -7.507)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 2 (approx. per word bound = -7.385, relative change = 1.628e-02)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 3 (approx. per word bound = -7.309, relative change = 1.020e-02)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 4 (approx. per word bound = -7.275, relative change = 4.722e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 5 (approx. per word bound = -7.249, relative change = 3.550e-03)

> Topic 1: russia, russian, president, said, says

> Topic 2: russian, russia, forces, says, said

> Topic 3: NA, ukraine, invasion, latest, russia's

> Topic 4: ukraine, war, said, ukrainian, world

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 6 (approx. per word bound = -7.224, relative change = 3.407e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 7 (approx. per word bound = -7.206, relative change = 2.553e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 8 (approx. per word bound = -7.191, relative change = 2.099e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 9 (approx. per word bound = -7.177, relative change = 1.861e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 10 (approx. per word bound = -7.165, relative change = 1.700e-03)

> Topic 1: russia, russian, ukraine, said, president

> Topic 2: russian, ukraine, forces, said, russia

> Topic 3: NA, ukraine, invasion, russia, latest

> Topic 4: war, ukraine, ukrainian, said, world

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 11 (approx. per word bound = -7.154, relative change = 1.477e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 12 (approx. per word bound = -7.145, relative change = 1.389e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 13 (approx. per word bound = -7.135, relative change = 1.340e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 14 (approx. per word bound = -7.127, relative change = 1.136e-03)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 15 (approx. per word bound = -7.121, relative change = 8.618e-04)

> Topic 1: ukraine, russian, russia, war, said

> Topic 2: ukraine, russian, forces, said, russia

> Topic 3: NA, ukraine, invasion, russia, latest

> Topic 4: war, ukraine, ukrainian, world, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 16 (approx. per word bound = -7.115, relative change = 7.891e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 17 (approx. per word bound = -7.110, relative change = 7.683e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 18 (approx. per word bound = -7.105, relative change = 6.341e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 19 (approx. per word bound = -7.101, relative change = 6.260e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 20 (approx. per word bound = -7.095, relative change = 7.314e-04)

> Topic 1: ukraine, russian, russia, war, said

> Topic 2: ukraine, russian, said, war, forces

> Topic 3: NA, ukraine, invasion, russia, russia's

> Topic 4: war, ukraine, world, said, food

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 21 (approx. per word bound = -7.089, relative change = 8.926e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 22 (approx. per word bound = -7.083, relative change = 9.179e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 23 (approx. per word bound = -7.078, relative change = 6.158e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 24 (approx. per word bound = -7.075, relative change = 4.426e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 25 (approx. per word bound = -7.072, relative change = 4.406e-04)

> Topic 1: ukraine, russian, russia, war, said

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, invasion, russia, russia's

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 26 (approx. per word bound = -7.069, relative change = 4.044e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 27 (approx. per word bound = -7.066, relative change = 4.010e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 28 (approx. per word bound = -7.063, relative change = 4.143e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 29 (approx. per word bound = -7.061, relative change = 3.774e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 30 (approx. per word bound = -7.059, relative change = 3.114e-04)

> Topic 1: ukraine, russian, russia, war, said

> Topic 2: ukraine, russian, ukrainian, war, said

> Topic 3: NA, ukraine, russia, invasion, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 31 (approx. per word bound = -7.057, relative change = 2.735e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 32 (approx. per word bound = -7.055, relative change = 2.366e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 33 (approx. per word bound = -7.054, relative change = 1.726e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 34 (approx. per word bound = -7.053, relative change = 1.542e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 35 (approx. per word bound = -7.051, relative change = 2.100e-04)

> Topic 1: ukraine, russian, russia, war, death

> Topic 2: ukraine, russian, ukrainian, war, said

> Topic 3: NA, ukraine, russia, invasion, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 36 (approx. per word bound = -7.049, relative change = 3.482e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 37 (approx. per word bound = -7.045, relative change = 5.213e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 38 (approx. per word bound = -7.041, relative change = 5.540e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 39 (approx. per word bound = -7.036, relative change = 7.125e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 40 (approx. per word bound = -7.032, relative change = 6.204e-04)

> Topic 1: ukraine, russian, war, russia, death

> Topic 2: ukraine, russian, ukrainian, war, said

> Topic 3: NA, ukraine, russia, president, invasion

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 41 (approx. per word bound = -7.029, relative change = 3.435e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 42 (approx. per word bound = -7.028, relative change = 2.465e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 43 (approx. per word bound = -7.026, relative change = 2.041e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 44 (approx. per word bound = -7.025, relative change = 1.893e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 45 (approx. per word bound = -7.023, relative change = 2.876e-04)

> Topic 1: ukraine, russian, war, russia, death

> Topic 2: ukraine, russian, ukrainian, war, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 46 (approx. per word bound = -7.021, relative change = 1.899e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 47 (approx. per word bound = -7.020, relative change = 1.441e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 48 (approx. per word bound = -7.020, relative change = 1.139e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 49 (approx. per word bound = -7.019, relative change = 9.469e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 50 (approx. per word bound = -7.018, relative change = 9.890e-05)

> Topic 1: ukraine, russian, war, russia, death

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 51 (approx. per word bound = -7.018, relative change = 9.234e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 52 (approx. per word bound = -7.017, relative change = 8.746e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 53 (approx. per word bound = -7.016, relative change = 8.283e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 54 (approx. per word bound = -7.016, relative change = 6.329e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 55 (approx. per word bound = -7.015, relative change = 7.540e-05)

> Topic 1: ukraine, russian, war, russia, death

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 56 (approx. per word bound = -7.015, relative change = 9.299e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 57 (approx. per word bound = -7.014, relative change = 1.088e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 58 (approx. per word bound = -7.013, relative change = 1.162e-04)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 59 (approx. per word bound = -7.013, relative change = 8.868e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 60 (approx. per word bound = -7.012, relative change = 7.620e-05)

> Topic 1: ukraine, russian, war, russia, death

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 61 (approx. per word bound = -7.012, relative change = 7.323e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 62 (approx. per word bound = -7.011, relative change = 6.586e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 63 (approx. per word bound = -7.011, relative change = 6.429e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 64 (approx. per word bound = -7.010, relative change = 6.432e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 65 (approx. per word bound = -7.010, relative change = 5.844e-05)

> Topic 1: ukraine, russian, war, russia, death

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 66 (approx. per word bound = -7.009, relative change = 4.804e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 67 (approx. per word bound = -7.009, relative change = 5.518e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 68 (approx. per word bound = -7.009, relative change = 8.199e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 69 (approx. per word bound = -7.008, relative change = 6.681e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 70 (approx. per word bound = -7.008, relative change = 5.674e-05)

> Topic 1: ukraine, russian, war, russia, death

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 71 (approx. per word bound = -7.007, relative change = 8.490e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 72 (approx. per word bound = -7.006, relative change = 8.039e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 73 (approx. per word bound = -7.006, relative change = 5.451e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 74 (approx. per word bound = -7.006, relative change = 3.820e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 75 (approx. per word bound = -7.006, relative change = 3.895e-05)

> Topic 1: ukraine, russian, war, russia, death

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 76 (approx. per word bound = -7.005, relative change = 4.076e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 77 (approx. per word bound = -7.005, relative change = 4.332e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 78 (approx. per word bound = -7.005, relative change = 4.682e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 79 (approx. per word bound = -7.004, relative change = 4.587e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 80 (approx. per word bound = -7.004, relative change = 2.889e-05)

> Topic 1: ukraine, russian, war, russia, us

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 81 (approx. per word bound = -7.004, relative change = 1.929e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 82 (approx. per word bound = -7.004, relative change = 1.486e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 83 (approx. per word bound = -7.004, relative change = 1.669e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 84 (approx. per word bound = -7.004, relative change = 1.915e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 85 (approx. per word bound = -7.004, relative change = 1.621e-05)

> Topic 1: ukraine, russian, war, russia, us

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 86 (approx. per word bound = -7.003, relative change = 1.890e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 87 (approx. per word bound = -7.003, relative change = 2.545e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 88 (approx. per word bound = -7.003, relative change = 1.939e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 89 (approx. per word bound = -7.003, relative change = 2.092e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 90 (approx. per word bound = -7.003, relative change = 3.154e-05)

> Topic 1: ukraine, russian, war, russia, us

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 91 (approx. per word bound = -7.003, relative change = 2.572e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 92 (approx. per word bound = -7.002, relative change = 2.551e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 93 (approx. per word bound = -7.002, relative change = 3.151e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 94 (approx. per word bound = -7.002, relative change = 3.750e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 95 (approx. per word bound = -7.002, relative change = 2.860e-05)

> Topic 1: ukraine, russian, war, russia, us

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 96 (approx. per word bound = -7.002, relative change = 1.951e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 97 (approx. per word bound = -7.001, relative change = 2.099e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 98 (approx. per word bound = -7.001, relative change = 2.493e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 99 (approx. per word bound = -7.001, relative change = 3.543e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 100 (approx. per word bound = -7.001, relative change = 5.701e-05)

> Topic 1: ukraine, russian, war, russia, us

> Topic 2: ukraine, russian, war, ukrainian, said

> Topic 3: NA, ukraine, russia, president, war

> Topic 4: war, ukraine, world, food, said

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 101 (approx. per word bound = -7.000, relative change = 5.996e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 102 (approx. per word bound = -7.000, relative change = 3.263e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Completing Iteration 103 (approx. per word bound = -7.000, relative change = 1.547e-05)

> ............................................................................................................

> Completed E-Step (0 seconds).

> Completed M-Step.

> Model Converged

summary(topic_model)

> A topic model with 4 topics, 759 documents and a 5236 word dictionary.

> Topic 1 Top Words:

> Highest Prob: ukraine, russian, war, russia, us, death, two

> FREX: sentenced, sky, dymyd, dymyd’s, funeral, separatist, bbc

> Lift: host, canada, indo, korean, singapore, earlier, beast

> Score: dymyd, dymyd’s, funeral, death, sentenced, mr, separatist

> Topic 2 Top Words:

> Highest Prob: ukraine, russian, war, ukrainian, said, russia, forces

> FREX: sievierodonetsk, zelenskiy, must, severodonetsk, luhansk, territory, glance

> Lift: charities, amanpour, christiane, der, leyen, ursula, von

> Score: sievierodonetsk, eset, severodonetsk, zelenskiy, control, stoltenberg, kharkiv

> Topic 3 Top Words:

> Highest Prob: NA, ukraine, russia, president, war, invasion, russia's

> FREX: NA, developments, seven, depth, roundup, leading, economies

> Lift: NA, unlikely, dictate, heartland, stakes, seven, alps

> Score: NA, seven, depth, roundup, developments, economically, isolate

> Topic 4 Top Words:

> Highest Prob: war, ukraine, world, food, said, economic, global

> FREX: worsened, crises, severely, diamonds, surat, oecd, workers

> Lift: aspect, invisible, recession, defended, 260, 467, 730

> Score: severely, crises, diamonds, surat, development, diamond, push

Explore topic model results

word_topics <- tidy(topic_model, matrix = "beta") word_topics

> # A tibble: 20,944 × 3

> topic term beta

>

> 1 1 aid 1.33e- 3

> 2 2 aid 3.05e- 4

> 3 3 aid 4.32e- 4

> 4 4 aid 3.98e-91

> 5 1 alongside 4.40e- 4

> 6 2 alongside 4.01e- 4

> 7 3 alongside 1.29e-86

> 8 4 alongside 2.03e- 4

> 9 1 battle 3.72e- 4

> 10 2 battle 1.91e- 3

> # … with 20,934 more rows

word_topics %>% group_by(topic) %>% slice_max(beta, n = 10) %>% ungroup() %>% mutate(topic = paste("Topic", topic)) %>% ggplot(aes(beta, reorder_within(term, beta, topic), fill = topic)) + geom_col(show.legend = FALSE) + facet_wrap(vars(topic), scales = "free_y") + scale_y_reordered() + labs(x = expression(beta), y = NULL) [wOvDoAl6GA7bgAAAABJRU5ErkJggg==]

title_topics <- tidy(topic_model, matrix = "gamma", document_names = rownames(news_sparse)) title_topics

> # A tibble: 3,036 × 3

> document topic gamma

>

> 1 "\"'As Ukraine war becomes grimmer than ever, we must offer th… 1 0.447

> 2 "\"'Hard to avoid!' World Bank warns 'MAJOR recession' looming… 1 0.0286

> 3 "\"'I would not at all bet on Russia': EU Commission President… 1 0.0217

> 4 "\"'Not a Justification but a Provocation': Chomsky on the Roo… 1 0.910

> 5 "\"'Nothing to apologise for!' Angela Merkel defends record on… 1 0.0448

> 6 "\"'Stop the War' hold rally in Southampton calling for end to… 1 0.163

> 7 "\"'Thanks Biden!' Putin mouthpiece claims US sanctions 'payin… 1 0.573

> 8 "\"'Threat Report' details cyberattacks linked to Ukraine war\… 1 0.00824

> 9 "\"‘I saw evidence of civilian life being targeted’ – EU Human… 1 0.0257

> 10 "\"‘More than 260 children killed during Ukraine war’\"" 1 0.143

> # … with 3,026 more rows

title_topics %>% mutate(document = fct_reorder(document, gamma), topic = factor(topic)) %>% ggplot(aes(gamma, topic, fill = topic)) + geom_col(show.legend = FALSE) + facet_wrap(vars(document), ncol = 4) + labs(x = expression(gamma), y = "Topic") [AdrHyz7+dU7dAAAAAElFTkSuQmCC]

effect <- estimateEffect( 1:4 ~ PUBLISHER, topic_model, tidy_uk %>% distinct (TITLE, PUBLISHER) %>% arrange(TITLE))

> Error in qr.lm(thetasims[, k], qx): number of covariate observations does not match number of docs

summary(effect)

> Error in summary(effect): object 'effect' not found

From: Julia Silge @.> Date: Thursday, July 21, 2022 at 9:50 PM To: juliasilge/tidytext @.> Cc: Seok Kang @.>, Author @.> Subject: [EXTERNAL] Re: [juliasilge/tidytext] Error in qr.lm(thetasims[, k], qx) (Issue #217) EXTERNAL EMAIL This email originated outside of The University of Texas at San Antonio. Please exercise caution when clicking on links or opening attachments.

Unfortunately I cannot yet reproduce the problem you are experiencing. Can you create a reprexhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freprex.tidyverse.org%2F&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CkTC%2FXNwxdRfOJOHVXwRBg2ohyupiymyKD0jZQZJi7s%3D&reserved=0 (a minimal reproducible example) for this? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org helphttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.tidyverse.org%2Fhelp%2F&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mYO1kV%2FInvpc3%2Ff5VskVX%2BDg%2BiLC8KR01G2c9Sd8kRU%3D&reserved=0 page. Thanks! 🙌

Here is an example that shows this code working:

library(tidyverse)

library(tidytext)

library(janeaustenr)

library(stm)

> stm v1.3.6 successfully loaded. See ?stm for help.

> Papers, resources, and other materials at structuraltopicmodel.com

books <- austen_books() %>%

group_by(book) %>%

mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%

ungroup() %>%

filter(chapter > 0) %>%

unite(document, book, chapter, remove = FALSE)

austen_sparse <- books %>%

unnest_tokens(word, text) %>%

anti_join(stop_words) %>%

count(document, word) %>%

cast_sparse(document, word, n)

> Joining, by = "word"

topic_model <- stm(

austen_sparse,

K = 4,

init.type = "Spectral",

verbose = FALSE

)

summary(topic_model)

> A topic model with 4 topics, 269 documents and a 13908 word dictionary.

> Topic 1 Top Words:

> Highest Prob: elizabeth, darcy, miss, bennet, lady, jane, bingley

> FREX: darcy, bennet, bingley, wickham, collins, lydia, lizzy

> Lift: baronetage, condescended, presumptive, gowland, finances, landlady, creditors

> Score: darcy, bennet, bingley, elizabeth, wickham, collins, jane

> Topic 2 Top Words:

> Highest Prob: emma, miss, harriet, weston, knightley, elton, jane

> FREX: weston, knightley, elton, woodhouse, fairfax, churchill, hartfield

> Lift: martin, goddard's, bangs, broadway, brunswick, cleverer, curtseys

> Score: emma, weston, knightley, elton, woodhouse, fairfax, churchill

> Topic 3 Top Words:

> Highest Prob: catherine, anne, captain, miss, time, tilney, wentworth

> FREX: tilney, thorpe, morland, allen, eleanor, henrietta, benwick

> Lift: ship, plunge, curl, heroic, castle, edifice, france

> Score: tilney, catherine, thorpe, morland, allen, wentworth, isabella

> Topic 4 Top Words:

> Highest Prob: fanny, elinor, miss, time, crawford, sir, marianne

> FREX: elinor, crawford, marianne, edmund, thomas, bertram, dashwood

> Lift: distract, nanny, heath, knoll, spunging, terrace, admirals

> Score: elinor, marianne, fanny, crawford, edmund, thomas, dashwood

chapters <- books %>%

group_by(document) %>%

summarize(text = str_c(text, collapse = " ")) %>%

ungroup() %>%

inner_join(books %>%

               distinct(document, book))

> Joining, by = "document"

chapters

> # A tibble: 269 × 3

> document text book

>

> 1 Emma_1 "CHAPTER I Emma Woodhouse, handsome, clever, and rich, with… Emma

> 2 Emma_10 "CHAPTER X Though now the middle of December, there had yet… Emma

> 3 Emma_11 "CHAPTER XI Mr. Elton must now be left to himself. It was n… Emma

> 4 Emma_12 "CHAPTER XII Mr. Knightley was to dine with them--rather ag… Emma

> 5 Emma_13 "CHAPTER XIII There could hardly be a happier creature in t… Emma

> 6 Emma_14 "CHAPTER XIV Some change of countenance was necessary for e… Emma

> 7 Emma_15 "CHAPTER XV Mr. Woodhouse was soon ready for his tea; and w… Emma

> 8 Emma_16 "CHAPTER XVI The hair was curled, and the maid sent away, a… Emma

> 9 Emma_17 "CHAPTER XVII Mr. and Mrs. John Knightley were not detained… Emma

> 10 Emma_18 "CHAPTER XVIII Mr. Frank Churchill did not come. When the t… Emma

> # … with 259 more rows

effects <- estimateEffect(1:3 ~ book, topic_model, chapters)

summary(effects)

>

> Call:

> estimateEffect(formula = 1:3 ~ book, stmobj = topic_model, metadata = chapters)

>

>

> Topic 1:

>

> Coefficients:

> Estimate Std. Error t value Pr(>|t|)

> (Intercept) 0.031673 0.027522 1.151 0.251

> bookPride & Prejudice 0.906982 0.035294 25.698 < 2e-16 ***

> bookMansfield Park -0.018987 0.036336 -0.523 0.602

> bookEmma -0.003074 0.036239 -0.085 0.932

> bookNorthanger Abbey -0.001549 0.040821 -0.038 0.970

> bookPersuasion 0.291445 0.063176 4.613 6.2e-06 ***

> ---

> Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

>

>

> Topic 2:

>

> Coefficients:

> Estimate Std. Error t value Pr(>|t|)

> (Intercept) 1.858e-02 1.673e-02 1.111 0.268

> bookPride & Prejudice 1.895e-03 2.440e-02 0.078 0.938

> bookMansfield Park -4.086e-05 2.308e-02 -0.002 0.999

> bookEmma 9.087e-01 3.211e-02 28.302 <2e-16 ***

> bookNorthanger Abbey 1.953e-03 2.895e-02 0.067 0.946

> bookPersuasion 2.479e-03 3.041e-02 0.081 0.935

> ---

> Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

>

>

> Topic 3:

>

> Coefficients:

> Estimate Std. Error t value Pr(>|t|)

> (Intercept) 0.023527 0.026388 0.892 0.373

> bookPride & Prejudice -0.001951 0.036241 -0.054 0.957

> bookMansfield Park 0.033284 0.042596 0.781 0.435

> bookEmma -0.003720 0.034816 -0.107 0.915

> bookNorthanger Abbey 0.903470 0.045303 19.943 <2e-16 ***

> bookPersuasion 0.612319 0.059020 10.375 <2e-16 ***

> ---

> Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

tidy(effects)

> # A tibble: 18 × 6

> topic term estimate std.error statistic p.value

>

> 1 1 (Intercept) 0.0314 0.0275 1.14 2.54e- 1

> 2 1 bookPride & Prejudice 0.908 0.0353 25.7 1.00e-73

> 3 1 bookMansfield Park -0.0194 0.0368 -0.526 5.99e- 1

> 4 1 bookEmma -0.00252 0.0365 -0.0690 9.45e- 1

> 5 1 bookNorthanger Abbey -0.00134 0.0403 -0.0331 9.74e- 1

> 6 1 bookPersuasion 0.292 0.0628 4.65 5.27e- 6

> 7 2 (Intercept) 0.0186 0.0165 1.13 2.60e- 1

> 8 2 bookPride & Prejudice 0.00214 0.0241 0.0889 9.29e- 1

> 9 2 bookMansfield Park 0.000182 0.0228 0.00796 9.94e- 1

> 10 2 bookEmma 0.909 0.0317 28.7 5.66e-83

> 11 2 bookNorthanger Abbey 0.00217 0.0286 0.0757 9.40e- 1

> 12 2 bookPersuasion 0.00269 0.0304 0.0885 9.30e- 1

> 13 3 (Intercept) 0.0236 0.0262 0.898 3.70e- 1

> 14 3 bookPride & Prejudice -0.00201 0.0360 -0.0558 9.56e- 1

> 15 3 bookMansfield Park 0.0325 0.0424 0.767 4.44e- 1

> 16 3 bookEmma -0.00331 0.0349 -0.0948 9.25e- 1

> 17 3 bookNorthanger Abbey 0.903 0.0450 20.1 5.71e-55

> 18 3 bookPersuasion 0.612 0.0591 10.3 2.92e-21

Created on 2022-07-21 by the reprex packagehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freprex.tidyverse.org%2F&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CkTC%2FXNwxdRfOJOHVXwRBg2ohyupiymyKD0jZQZJi7s%3D&reserved=0 (v2.0.1)

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjuliasilge%2Ftidytext%2Fissues%2F217%23issuecomment-1192125279&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ij8Mf%2BV3sfki5uyqqofd3Nhval%2Fqe6urU8J51mkdz94%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAH3KBUFAXXBAIKLAYIZZTY3VVIEAHANCNFSM54FLSVVQ&data=05%7C01%7Cseok.kang%40utsa.edu%7Cdf57bf074f4243ed767008da6b8cf894%7C3a228dfbc64744cb88357b20617fc906%7C0%7C0%7C637940550471757722%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=l%2BUgYTbeczcQAN1rM9WwT7z6gx9orHRQCZ%2BjtbO9E9s%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

juliasilge commented 2 years ago

Could you update your example to use some more easily accessible data (I don't have your CSV file) and the reprex package? Using reprex makes it easier to see both the input and output, and for us to re-run the code in a local session. Your output should look like mine if you paste it in; here are two articles if you are having trouble:

Thanks! 🙌

kangutsa commented 2 years ago

Thank you. Below are codes using the reprex package. The dataset link is https://utsacloud-my.sharepoint.com/:x:/g/personal/seok_kang_utsa_edu/EbTEWqpOfytGsoEkzyGJdiMBewtcX58OGTxHhj11wQBx5w?e=1sBpQZ

Any advice will be appreciated. I wanted to look at if publishers predict topics.

library(tidyverse)
library(reprex)
news <-read_csv('/Users/Seok1/Desktop/mining/ukraine2.csv')
#> Rows: 24 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (10): TITLE, ARTICLE LINK, PUBLISHED DATE (GMT), AUTHOR, PUBLISHER, COUN...
#> lgl  (1): VIDEO URL
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
news %>% distinct(TITLE)
#> # A tibble: 22 × 1
#>    TITLE                                                                        
#>    <chr>                                                                        
#>  1 "\"German chancellor says Putin is ready to wage Ukraine war for \"a long ti…
#>  2 "\"How could Putin’s Ukraine war trigger famine more than 8000km away?\""    
#>  3 "\"Russia-Ukraine war: What happened today (June 30)\""                      
#>  4 "\"Ukraine war: All they will inherit is rubble - relentless Russian bombard…
#>  5 "\"Ukraine war: Klitschko brothers plead to Nato leaders\""                  
#>  6 "\"Russia-Ukraine war: Buhari calls for increased gas partnership with Portu…
#>  7 "\"Ukraine war: five things you need to know about the confict this Thursday…
#>  8 "\"Ukraine war: New 'iron curtain' descending, warns Sergei Lavrov\""        
#>  9 "\"Should Russian culture be 'cancelled' over Ukraine war?\""                
#> 10 "\"Would Ukraine war not have happened if Putin was a woman?\""              
#> # … with 12 more rows
news %>% distinct(TITLE, DESCRIPTION)
#> # A tibble: 22 × 2
#>    TITLE                                                             DESCRIPTION
#>    <chr>                                                             <chr>      
#>  1 "\"German chancellor says Putin is ready to wage Ukraine war for… "CBS News'…
#>  2 "\"How could Putin’s Ukraine war trigger famine more than 8000km… "Block a p…
#>  3 "\"Russia-Ukraine war: What happened today (June 30)\""           "A roundup…
#>  4 "\"Ukraine war: All they will inherit is rubble - relentless Rus…  <NA>      
#>  5 "\"Ukraine war: Klitschko brothers plead to Nato leaders\""       "He's a ma…
#>  6 "\"Russia-Ukraine war: Buhari calls for increased gas partnershi… "The Presi…
#>  7 "\"Ukraine war: five things you need to know about the confict t… "Sanctions…
#>  8 "\"Ukraine war: New 'iron curtain' descending, warns Sergei Lavr…  <NA>      
#>  9 "\"Should Russian culture be 'cancelled' over Ukraine war?\""     "There are…
#> 10 "\"Would Ukraine war not have happened if Putin was a woman?\""   "“IF Putin…
#> # … with 12 more rows

library(tidytext)

tidy_uk <-
  news %>%
  unnest_tokens(word, DESCRIPTION) %>%
  anti_join(get_stopwords())
#> Joining, by = "word"

tidy_uk %>% count(word, sort = TRUE)
#> # A tibble: 281 × 2
#>    word         n
#>    <chr>    <int>
#>  1 ukraine     13
#>  2 nato         7
#>  3 russian      6
#>  4 war          6
#>  5 military     5
#>  6 said         5
#>  7 <NA>         5
#>  8 eastern      4
#>  9 finland      4
#> 10 invasion     4
#> # … with 271 more rows
tidy_uk %>% count(TITLE, word, sort = TRUE)
#> # A tibble: 359 × 3
#>    TITLE                                                             word      n
#>    <chr>                                                             <chr> <int>
#>  1 "\"Russia-Ukraine war: Nato says Moscow is biggest ‘direct threa… east…     3
#>  2 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… arms      2
#>  3 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… nato      2
#>  4 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… prev…     2
#>  5 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… race      2
#>  6 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… ukra…     2
#>  7 "\"Ardern calls on NATO to prevent Ukraine war from triggering a… war       2
#>  8 "\"German porcelain maker faces fragile future due to Ukraine wa… <NA>      2
#>  9 "\"How could Putin’s Ukraine war trigger famine more than 8000km… anot…     2
#> 10 "\"How could Putin’s Ukraine war trigger famine more than 8000km… bears     2
#> # … with 349 more rows

## train topic model

news_sparse <-
  tidy_uk %>%
  count(TITLE, word) %>%
  cast_sparse(TITLE, word, n)

library(stm)
#> stm v1.3.6 successfully loaded. See ?stm for help. 
#>  Papers, resources, and other materials at structuraltopicmodel.com

topic_model <- stm(news_sparse, K = 4)
#> Beginning Spectral Initialization 
#>   Calculating the gram matrix...
#>   Finding anchor words...
#>      ....
#>   Recovering initialization...
#>      ..
#> Initialization complete.
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 1 (approx. per word bound = -5.167) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 2 (approx. per word bound = -4.813, relative change = 6.857e-02) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 3 (approx. per word bound = -4.692, relative change = 2.508e-02) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 4 (approx. per word bound = -4.640, relative change = 1.101e-02) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 5 (approx. per word bound = -4.610, relative change = 6.432e-03) 
#> Topic 1: NA, nato, war, ukraine, arms 
#>  Topic 2: country, madrid, military, one, leaders 
#>  Topic 3: ukraine, invasion, said, putin, russian 
#>  Topic 4: ukraine, russian, russia, eastern, military 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 6 (approx. per word bound = -4.596, relative change = 3.178e-03) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 7 (approx. per word bound = -4.586, relative change = 2.204e-03) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 8 (approx. per word bound = -4.581, relative change = 9.745e-04) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 9 (approx. per word bound = -4.579, relative change = 5.114e-04) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 10 (approx. per word bound = -4.577, relative change = 4.484e-04) 
#> Topic 1: NA, nato, war, ukraine, arms 
#>  Topic 2: country, madrid, one, military, nato 
#>  Topic 3: ukraine, invasion, russian, coverage, depth 
#>  Topic 4: ukraine, russia, russian, eastern, finland 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 11 (approx. per word bound = -4.575, relative change = 4.195e-04) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 12 (approx. per word bound = -4.573, relative change = 4.057e-04) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 13 (approx. per word bound = -4.568, relative change = 1.020e-03) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 14 (approx. per word bound = -4.560, relative change = 1.810e-03) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 15 (approx. per word bound = -4.554, relative change = 1.371e-03) 
#> Topic 1: NA, nato, ukraine, war, arms 
#>  Topic 2: country, one, military, crisis, supply 
#>  Topic 3: ukraine, invasion, russian, coverage, depth 
#>  Topic 4: ukraine, russia, russian, eastern, finland 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 16 (approx. per word bound = -4.553, relative change = 1.646e-04) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Completing Iteration 17 (approx. per word bound = -4.553, relative change = 2.456e-05) 
#> ......................
#> Completed E-Step (0 seconds). 
#> Completed M-Step. 
#> Model Converged

summary(topic_model)
#> A topic model with 4 topics, 22 documents and a 281 word dictionary.
#> Topic 1 Top Words:
#>       Highest Prob: NA, nato, ukraine, war, arms, prevent, race 
#>       FREX: NA, arms, prevent, race, war, thursday, nato 
#>       Lift: 30, alongside, appeared, ardern, arms, becoming, called 
#>       Score: NA, arms, prevent, race, 30, alongside, appeared 
#> Topic 2 Top Words:
#>       Highest Prob: country, one, military, crisis, supply, madrid, another 
#>       FREX: country, one, another, bears, block, brunt, can 
#>       Lift: belarus, campaign, covert, posed, reports, barely, economic 
#>       Score: belarus, country, another, bears, block, brunt, can 
#> Topic 3 Top Words:
#>       Highest Prob: ukraine, invasion, russian, war, coverage, depth, developments 
#>       FREX: invasion, coverage, depth, developments, key, latest, roundup 
#>       Lift: 18, airstrike, battled, centre, contested, killed, last 
#>       Score: coverage, depth, developments, key, latest, roundup, russia's 
#> Topic 4 Top Words:
#>       Highest Prob: ukraine, russia, russian, eastern, finland, sweden, military 
#>       FREX: russia, finland, sweden, eastern, russian, major, frontline 
#>       Lift: assistant, backdrop, become, buhari, especially, europe’s, garba 
#>       Score: arguments, frontline, offensive, russian, russia, major, know

## Explore topic model results

word_topics <- tidy(topic_model, matrix = "beta")
word_topics
#> # A tibble: 1,124 × 3
#>    topic term          beta
#>    <int> <chr>        <dbl>
#>  1     1 30        1.98e- 2
#>  2     2 30        2.59e-46
#>  3     3 30        1.49e-26
#>  4     4 30        1.04e-45
#>  5     1 alongside 1.98e- 2
#>  6     2 alongside 2.59e-46
#>  7     3 alongside 1.49e-26
#>  8     4 alongside 1.04e-45
#>  9     1 appeared  1.98e- 2
#> 10     2 appeared  2.59e-46
#> # … with 1,114 more rows

word_topics %>%
  group_by(topic) %>%
  slice_max(beta, n = 5) %>%
  ungroup() %>%
  mutate(topic = paste("Topic", topic)) %>%
  ggplot(aes(beta, reorder_within(term, beta, topic), fill = topic)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(vars(topic), scales = "free_y") +
  scale_y_reordered() +
  labs(x = expression(beta), y = NULL)


title_topics <- tidy(topic_model, matrix = "gamma",
                     document_names = rownames(news_sparse))  
title_topics  
#> # A tibble: 88 × 3
#>    document                                                        topic   gamma
#>    <chr>                                                           <int>   <dbl>
#>  1 "\"Ardern calls on NATO to prevent Ukraine war from triggering…     1 0.953  
#>  2 "\"Belarus posed for 'covert' military campaign in chilling Uk…     1 0.0195 
#>  3 "\"Fate of Ukraine war will be decided on the battlefield, not…     1 0.00906
#>  4 "\"German chancellor says Putin is ready to wage Ukraine war f…     1 0.00966
#>  5 "\"German porcelain maker faces fragile future due to Ukraine …     1 0.744  
#>  6 "\"How could Putin’s Ukraine war trigger famine more than 8000…     1 0.00591
#>  7 "\"NATO and the Ukraine war: It took 30 years for Russia and t…     1 0.0361 
#>  8 "\"Russia-Ukraine war: Buhari calls for increased gas partners…     1 0.00449
#>  9 "\"Russia-Ukraine war: Moscow intensifies attacks in Ukraine a…     1 0.0268 
#> 10 "\"Russia-Ukraine war: Nato says Moscow is biggest ‘direct thr…     1 0.00229
#> # … with 78 more rows

title_topics %>%
  mutate(document = fct_reorder(document, gamma),
         topic = factor(topic)) %>%
  ggplot(aes(gamma, topic, fill = topic)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(vars(document), ncol = 4) +
  labs(x = expression(gamma), y = "Topic")


impact <-
  estimateEffect(
    1:4 ~ CATEGORY,
    topic_model,
    tidy_uk %>% 
      distinct (TITLE, CATEGORY) 
    %>% arrange(TITLE))
#> Error in qr.lm(thetasims[, k], qx): number of covariate observations does not match number of docs

summary(impact)
#> Error in summary(impact): object 'impact' not found

Created on 2022-07-22 by the reprex package (v2.0.1)

juliasilge commented 2 years ago

Unfortunately I don't see a way to download that data @kangutsa.

Can you read this article on how to create a reprex? Especially notice the first "main requirement":

Use the smallest, simplest, most built-in data possible.

juliasilge commented 2 years ago

Let me know if you have further questions!

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.