mtm80 / russ-project

2 stars 0 forks source link

Project issue Feb. 21, 2018 #36

Closed mtm80 closed 6 years ago

mtm80 commented 6 years ago

Based on this week's meeting, we almost completed our reviews of research methodologies and all that is left of that is for Ian to review a small portion for relevant linguistic aspects. We have identified our texts for analysis but the Zhirinovsky texts must be narrowed and for greatest relevancy. We have a schema developed for our texts and will be doing markup for those that we select. We will also be determining our webpage organization.

JosephDRogers23 commented 6 years ago

I was looking through your repository and noticed several of your TEI files for your project. I also looked at your speeches.rnc file, which had a lot of TEI documentation in it. How much of this file was boilerplate code, and how much was edited by you? Did you have difficulties in creating your .odd file?

richiebful commented 6 years ago

@JosephDRogers23 The speeches.rnc is completely generated via a transformation on my ODD file, so that's all done through a black box process as far as I'm concerned. The ODD file wasn't too hard to build, but I made a typo that led to an error saying that the <TEI> tag wasn't allowed as root. Thankfully, Dr. B unstuck me.

Right now, the ODD is just constraining what elements from all the TEI modules we're using. We do this using the <moduleRef> element and the corresponding @include attribute. This week, we're working to constrain ourselves a bit further by setting the different @type values we can use on the <interp> element to mark various macro-level linguistic phenomena. (see /research/discourseTerms for a taste). I'm working on lemmatizing our texts, such that every word spoken is wrapped in a <w> tag with the @lemma attribute set. This will help us get more value out of topic modelling with MALLET, when we get to that point next week.

brucknerp commented 6 years ago

@richiebful Lemmatization is great for topic modeling, but has your group considered trying any other language processing methods?

I see that the type of markup you are doing is heavily-content/rhetoric based, but the linguistics major in me wonders if you could incorporate some sort of analysis of language use.

Idi0teque commented 6 years ago

Wow, you guys are seriously ahead! How has TEI been working for you? Are the ODD files particularly cumbersome or are they actually easier than doing a regular schema? I also agree with @brucknerp that you could look into the connotations/meanings of the terms used; in American political discourse, at least, some innocuous-seeming terms ("stand your ground," "protect children," etc.) have much different meanings depending on the context of the usage/party in question. Good luck, guys!

richiebful commented 6 years ago

@ianloughney What do you think about analyzing these sorts of "dog whistle" politics? It might be a little bit tricky given our Russian skill (decoding some instances may require a very high, almost native level of knowledge to understand)