I'm using widyr to do text mining homework, where I'm asked to calculate word association of NY Time articles.
For the input dataframe, I have word (unigram) and document idx and author name.
Then I use the following code to calculate pairwise correlation and pick trump out.
> trump.cor[which(trump.cor$correlation==Inf),]
# A tibble: 38 × 4
item1 item2 correlation author
<fctr> <fctr> <dbl> <fctr>
1 trump ad Inf Thomas L. Friedman
2 trump american Inf Thomas L. Friedman
3 trump ani Inf Thomas L. Friedman
4 trump anoth Inf Thomas L. Friedman
5 trump bad Inf Thomas L. Friedman
6 trump bring Inf Thomas L. Friedman
7 trump candid Inf Thomas L. Friedman
8 trump common Inf Thomas L. Friedman
9 trump connect Inf Thomas L. Friedman
10 trump democrat Inf Thomas L. Friedman
# ... with 28 more rows
> summary(trump.cor)
item1 item2 correlation author
trump :20908 ad : 5 Min. : -Inf David Brooks :4710
a : 0 american: 5 1st Qu.:0.02592 Maureen Dowd :5909
aaron : 0 ani : 5 Median :0.03043 Nicholas Kristof :5877
aarondmil: 0 anoth : 5 Mean : NaN Paul Krugman :4372
aarp : 0 bad : 5 3rd Qu.:0.04327 Thomas L. Friedman: 40
ababa : 0 bring : 5 Max. : Inf
(Other) : 0 (Other) :20878
For anyone who wants to replicate my result, the r data file (read using readRDS) is attached.
trump clinton.zip
Hi,
I'm using widyr to do text mining homework, where I'm asked to calculate word association of NY Time articles.
For the input dataframe, I have word (unigram) and document idx and author name. Then I use the following code to calculate pairwise correlation and pick trump out.
There are inf values in the result:
For anyone who wants to replicate my result, the r data file (read using readRDS) is attached. trump clinton.zip