newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
10 stars 4 forks source link

Mystery Text Discussion: dc.txt #108

Open ebeshero opened 1 week ago

ebeshero commented 1 week ago

Post your screenshots and discuss your findings about dc.txt here!

EvLar64 commented 1 week ago

After putting the document into AntConc/Voyant, one of the first things I did was run it for Ngrams of length 3. It didn't take me too long to figure out the source after that, because pretty close to the top of the Ngram list was the phrase "dr van helsing," making me figure it's something with Dracula and probably the original novel.

I also checked Ngrams of 5 and found a lot of mentions of diaries and journals, and without having read the book myself, it makes me think that it might play out largely (or at least partially) through the perspective of a fictional characters's writings. After that, I looked at the word collage. Probably because I figured it was Dracula and a horror story, I was expecting to see a lot of uses of ominous and dreary words, like "blood" or "fear," etc. close to the top. In reality, its biggest words are pretty mundane ("time," "said," "know"). Again having not read the book, I feel like this backs by conclusion that it's mostly diary entries, being written by people who are clinically trying to recall life events rather than tell a story to an audience.

dc screenshot 1 dc screenshot 2
meganlitz28 commented 1 week ago

I had this document pushed into the Voyant tool so I could see if I could know where this text was from. However, the main two words were "said" and "know" so then I switched over to AntConc to see if I could learn more about the text. When looking at Antconc it gave me the phrase "dr seward's diary" when I did the N-Gram size at a four. When I started reading the text I didn't know what it was until I got to the part where they mentioned Dracula. I was 50% sure it was Dracula and then I looked up "dr seward's diary" to confirm I was correct and it showed I was correct. I've not read Drcula before so that is why I was not sure about my answer.

Screenshot 2024-10-29 at 9 08 17 AM Screenshot 2024-10-29 at 10 28 01 AM
NathanH1611 commented 1 week ago

When I first put this mystery text into Voyant Tools, nothing immediately jumped out at me as the most used words were "said, shall, know, time, come." These words are not exactly unique, so I had to load up AntConc to check the n-grams. I started with N-grams of 3, which is where I saw word combinations such as "Dr. Van Helsing, Dr. Seward, the Count." From this, I upped the N-grams to 4, at which point Dr. Seward's Diary was revealed to be the most frequent, at 39. Then With a quick Google search as to what Dr. Seward's Diary was, I determined that the mystery text must be Dracula.

Screenshot 2024-10-29 234342 Screenshot 2024-10-29 234154 Screenshot 2024-10-29 234937

afish2003 commented 6 days ago

I used ant conk for all my work on this assignment as I was having issues with getting voyant to work. My chosen document was dc.txt

My discoveries are as follows:

First I checked how my results changed with varying Ngram sizes. I found that for count to be above 5 you had to be ngram size 5 or lower. I also found for double digit counts your size has to be 4 or lower.

Some of the most repeated phrases were "I did not know what to" (7 times) and dr Seward s diary (39 times).

Screenshot 2024-10-30 at 12 34 19 PM Screenshot 2024-10-30 at 12 33 54 PM

For the segment on choosing clusters to explore I could not access the tutorial as the link does not work for me. I tried my best to figure out what I was supposed to do. I searched in the cluster section of ant konk for dr Sewards diary and it displayed all the contexts in which that string was displayed.

Screenshot 2024-10-30 at 12 35 53 PM

My biggest questions on this regard how to go about using the software as Im struggling to remember from class and can not access the demo. Hopefully once I get access to the demo my understanding of this topic will clear up

ashlynnallgeier commented 6 days ago

When I uploaded the text into Voyant, it came up that the most frequent words were "said" and "shall". When i put it into AntConc, I first tested it with n-grams of 2, which brought up things like "of the", "in the", and "to the". I then moved onto n-grams of 3 which was still pretty generic. So, when I put in n-grams of 4, the most frequent words were "dr seward s diary". Along with that, another frequent cluster of words was "jonathan harker s journal". This was enough context for me to google these phrases and find out that the text is Dracula.

Screenshot 2024-10-30 at 12 35 02 PM Screenshot 2024-10-30 at 12 23 23 PM Screenshot 2024-10-30 at 12 22 03 PM