stephbuon / hansard-shiny

Code for the "Hansard Viewer" web app (a prototype app for applying to future support).
https://shinyviz.smu.edu/shiny/public/hansard-shiny/
MIT License
5 stars 0 forks source link

Why SVD? #61

Open stephbuon opened 2 years ago

stephbuon commented 2 years ago

Go to Word Context/Vector Space. This panel is the odd one out. In fact, it's not even the Hansard Corpus. It's just a bunch of tweets from Kaggle. But I am having trouble deleting it because I feel like there's such potential here and I want to explore that potential before I delete it.

The results are based on this tutorial: https://cbail.github.io/textasdata/word2vec/rmarkdown/word2vec.html

Please note that several lines of code in the tutorial don't work. That's okay -- you will still produce the correct viz at the end.

From an analysis standpoint (if you want to work on a more social-sciencey problem) I'd love to explore whether this method is of any use to history and historical analysis.

So here's a starting question: why is "justice" is closer to "kavanaugh" than "scotus" or "protections." Of course, one reason is because "justice" was said more times in relation to "kavanaugh" because he was being nominated as Supreme Court Justice -- but is there a way to make this visualization less opaque to every day readers? Like, without greater context, it may be unclear why "kavanaugh" is more correlated with "justice" than, say, "protections." Without context one could assume that "kavanaugh" is more similar to "justice."

So it could be that this approach is actually misleading or confusing.

Do you have any thoughts about improving this approach / page and making it more informational to users?