mtm80 / russ-project

2 stars 0 forks source link

4 April Project #68

Closed mtm80 closed 6 years ago

mtm80 commented 6 years ago

For this week, our group had to postpone the weekly meeting until Wednesday because one member was sick on Monday.

This week: Matt completed the Zhirinovskii text markup, wrote a draft of a methodology page, and expanded the "about page". Next week, he'll be working with Ian to get the website up reflecting all of the content he has generated thus far.

Richie marked up all 5 Putin texts with rhetorical markup, explored the MALLET output and wrote an analysis of the results. This upcoming week, he's going to fix his MALLET "code" so it gives us document-level topic data (ie. what does Putin talk about a lot). He's also going to write an analysis about what the topic model tells us, versus reality. He's also going to generate visualizations for each candidate, based on what rhetorical devices they use often.

We'll all be working together to get all the content up as we want it.

brucknerp commented 6 years ago

Looks like everything is coming along! I'm really interested in seeing what the topic models can show, and wonder what visualizations you will use for them and each candidate.

Idi0teque commented 6 years ago

Wow, seems like you guys are doing really well at this point! Can you explain a little about the difference between the topic model and reality? What can't you capture with it, and how relevant is it?

richiebful commented 6 years ago

To build a topic model, the computer looks for words that often occur together. For example, "Украина" (Ukraine), "Крым" (Crimea), "присоединение" (annexation), and "сеператисты" (conflict) might show up in the same topic because Putin and Zhirinovskii often used these words together. However, the computer doesn't really have an understanding of real, human topics, like the Ukraine conflict. It's a heuristic for generating human topics, so sometimes a real topic might be seperated into two computer-generated topics, or a computer-generated topic will be completely nonsensical.

For example, here's a topic our model generated:

This topic doesn't really have any conceptual basis, so we can toss it.