Open ebeshero opened 1 week ago
After putting the document into AntConc/Voyant, one of the first things I did was run it for Ngrams of length 3. It didn't take me too long to figure out the source after that, because pretty close to the top of the Ngram list was the phrase "dr van helsing," making me figure it's something with Dracula and probably the original novel.
I also checked Ngrams of 5 and found a lot of mentions of diaries and journals, and without having read the book myself, it makes me think that it might play out largely (or at least partially) through the perspective of a fictional characters's writings. After that, I looked at the word collage. Probably because I figured it was Dracula and a horror story, I was expecting to see a lot of uses of ominous and dreary words, like "blood" or "fear," etc. close to the top. In reality, its biggest words are pretty mundane ("time," "said," "know"). Again having not read the book, I feel like this backs by conclusion that it's mostly diary entries, being written by people who are clinically trying to recall life events rather than tell a story to an audience.
I had this document pushed into the Voyant tool so I could see if I could know where this text was from. However, the main two words were "said" and "know" so then I switched over to AntConc to see if I could learn more about the text. When looking at Antconc it gave me the phrase "dr seward's diary" when I did the N-Gram size at a four. When I started reading the text I didn't know what it was until I got to the part where they mentioned Dracula. I was 50% sure it was Dracula and then I looked up "dr seward's diary" to confirm I was correct and it showed I was correct. I've not read Drcula before so that is why I was not sure about my answer.
When I first put this mystery text into Voyant Tools, nothing immediately jumped out at me as the most used words were "said, shall, know, time, come." These words are not exactly unique, so I had to load up AntConc to check the n-grams. I started with N-grams of 3, which is where I saw word combinations such as "Dr. Van Helsing, Dr. Seward, the Count." From this, I upped the N-grams to 4, at which point Dr. Seward's Diary was revealed to be the most frequent, at 39. Then With a quick Google search as to what Dr. Seward's Diary was, I determined that the mystery text must be Dracula.
I used ant conk for all my work on this assignment as I was having issues with getting voyant to work. My chosen document was dc.txt
My discoveries are as follows:
First I checked how my results changed with varying Ngram sizes. I found that for count to be above 5 you had to be ngram size 5 or lower. I also found for double digit counts your size has to be 4 or lower.
Some of the most repeated phrases were "I did not know what to" (7 times) and dr Seward s diary (39 times).
For the segment on choosing clusters to explore I could not access the tutorial as the link does not work for me. I tried my best to figure out what I was supposed to do. I searched in the cluster section of ant konk for dr Sewards diary and it displayed all the contexts in which that string was displayed.
My biggest questions on this regard how to go about using the software as Im struggling to remember from class and can not access the demo. Hopefully once I get access to the demo my understanding of this topic will clear up
When I uploaded the text into Voyant, it came up that the most frequent words were "said" and "shall". When i put it into AntConc, I first tested it with n-grams of 2, which brought up things like "of the", "in the", and "to the". I then moved onto n-grams of 3 which was still pretty generic. So, when I put in n-grams of 4, the most frequent words were "dr seward s diary". Along with that, another frequent cluster of words was "jonathan harker s journal". This was enough context for me to google these phrases and find out that the text is Dracula.
Post your screenshots and discuss your findings about dc.txt here!