Closed cainesap closed 6 years ago
Great question (and catch on the typo). I just pushed the fix in 4950dbc3c414f7a7b5f0ac6afc171fb64eea9ead. Basically, you just need to do this now because the results of cnlp_get_tfidif
now returns just a sparse matrix with row and column names:
pca <- cnlp_get_token(sotu) %>%
filter(pos %in% c("NN", "NNS")) %>%
cnlp_get_tfidf(min_df = 0.05, max_df = 0.95, type = "tfidf", tf_weight = "dnorm") %>%
cnlp_pca(cnlp_get_document(sotu))
And it should work as expected. Please let me know if you still run into any trouble.
Thank you for the quick response!
So.. that error goes away but now I get --
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
-- which made me wonder if one of my nouns is a number, but I checked this and no.
However, I've taken the tfidf object and passed it to plain prcomp(), so this has still been super useful, thanks! Don't worry though, I expect it's a problem with my dataset..
One other question: is it possible to share your plot code for the PCA? It's very good looking!
Sure, here is the code that you should be able to adapt to your data:
ggplot(pca, aes(PC1, PC2)) +
geom_point(aes(color = cut(year, 10, dig.lab = 4)), alpha = 0.35, size = 4) +
geom_text_repel(data = filter(pca, !duplicated(president)),
aes(label = president), color = grey(0.4), cex = 3) +
labs(color = "Year") +
scale_color_viridis(discrete=TRUE, end = 0.9, option = "C") +
theme(axis.title.x = element_text(size = 14),
axis.title.y = element_text(size = 14),
axis.text.x = element_blank(),
axis.text.y = element_blank())
Hello,
Firstly, this is a great resource, thank you!
Second, I'm having trouble replicating the PCA analysis from the vignette https://cran.r-project.org/web/packages/cleanNLP/vignettes/case_study.html
The example is as follows:
If I try to run this, (minor error is the double underscore in cnlp__pca()) -- I get the following error message:
I'm not sure how to fix this: please could you help? Andrew