Closed jwzimmer-zz closed 3 years ago
=== QAnon & insurrection specifically ===
Transcripts of speech (mostly podcasts) covering or inciting QAnon.
About:
Of:
Just thinking about the qualitative differences in the way Alex Jones speaks vs. regular people, I wonder if you can basically just use frequency of unusual, hyperbolic conspiracy-related phrases, like: devil-worshipping, child-molesting, pedophile, crime ring, sex ring, heathen, antichrist, electromagnetic, ley line, praying, psychic, premonition, demon... it seems like there are a lot of words that are pretty unusual in "regular" speech, and definitely unusual at such a high density, that come up a lot in conspiracy rhetoric.
Learning more about QAnon, violent extremism, and radicalization:
I have enough for now.
I was thinking that something like transcripts of Knowledge Fight vs. Info Wars would be guaranteed to be "discussion about" and "discussion of" conspiracy theories like qanon, respectively. We can use people self-sorting into these niche groups as a way to infer labels for corpus data.
Unfortunately it doesn't look like Knowledge Fight (or Qanon Anonymous) has transcripts available... Looking for alternative options?
(?) Denotes I'm not sure how accessible this data is. For non-(?), I can actually see the text of the transcript, so I am pretty confident we could use it.
=== Discussion about ===
Opening arguments has transcripts, at least on occasion they touch on conspiracies and qanon:
OA354 has about 11304 words (according to https://wordcounter.net/), so if the average OA episode is about 10000 words, and there are 7 pages of search results for "conspiracy", each of which has 10 episodes per page, that would mean a corpus of about 720000 words (7x10x10000 + 2x10000), assuming all of that is relevant enough to include (which I'm not at all sure of). Is that big enough?
United States of Conspiracy: https://www.pbs.org/wgbh/frontline/film/united-states-of-conspiracy/transcript/
Infowars lawsuits: https://infowarslawsuit.com/, there are lots of court documents, although they might be too different in construction from more conversational formats to reasonably include. There are also some hearing transcripts, which might be similar enough, although they are in pdf form so would need to be converted to an easier to parse format (this might be trivially easy, I don't know), e.g. https://infowarslawsuit.com/wp-content/uploads/2019/03/3-January-24-2019-Hearing-Transcript.pdf, https://infowarslawsuit.com/wp-content/uploads/2019/03/3-January-24-2019-Hearing-Transcript.pdf, https://infowarslawsuit.com/wp-content/uploads/2019/03/6-August-30-2018-Hearing-Transcript.pdf.
This American Life has covered some conspiracy theories and has transcripts available:
Knowledge fight subreddit: https://www.reddit.com/r/KnowledgeFight/
=== Discussion of ===
Parler: content from Parler may be available via https://archiveteam.org/index.php?title=Projects, although right now the page is "redirecting due to heavy load"? (From https://gizmodo.com/every-deleted-parler-post-many-with-users-location-dat-1846032466 via @JaneAdams)
QAnon supporters:
Project Camelot: does look like it has some transcripts, https://projectcamelot.org/lang/en/flag.html
=== Both? ===
Reddit: depending on the subreddit, there is both kinds of discussion, we might be able to look at specific subreddits as a way to label data: https://archiveteam.org/index.php?title=Reddit. There appears to be reddit content on Internet Archive as well, https://archive.org/search.php?query=reddit, but I'm not really sure how to use it or what it is exactly.