newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery Text Discussion: dc.txt #101

Open ebeshero opened 4 months ago

ebeshero commented 4 months ago

Post your screenshots and discuss your findings about dc.txt here!

Fancy811 commented 4 months ago

The analysis conducted using AntConc and Voyant Tools produced intriguing findings. For Ngram sizes spanning 1 to 6, the frequency count exceeded 5. However, when examining Ngram sizes between 1 and 5, the frequency numbers entered the double-digit territory. Specific phrases exhibited noteworthy repetition rates: "and there was a" and "beyond the" each surfaced 10 times, while "a hard" and "And their" were encountered 9 times apiece. A particularly striking observation emerged when the frequency threshold was adjusted to 5: the search outcomes for an Ngram size of 5 plummeted from 100 to merely 20, highlighting the sensitivity of search results to frequency settings.

During the exploration of clusters in KWIC (Key Word in Context), I observed that the entire text functions as a complex cluster composed of diverse elements. These elements include journals by Jonathan Harker, letters from Miss Mina Murray, her own journal entries, and diary entries by Dr. Seward. This mosaic of perspectives enriches the narrative, offering a multifaceted view of the storyline through the varied documents and personal accounts interwoven throughout the text. Here are some of the screen shorts from my KWIC findings Screenshot 2024-03-13 010530 Screenshot 2024-03-13 010551 ss2 (2)

connorcarpenter13 commented 4 months ago

From this text file, I found it very interesting to see how often these common words are used. When typing or writing anything, it seems like most people don't even realize how often these words are used, especially ones such as "the" "of" or "and." Obviously they are used for a variety of reasons, but they seem to be used as filler a lot of the time when they might not necessarily be needed, just to fill up space on a page or to meet a specific word count. After doing some research, all of these words are among the most commonly used words in the English language, which does not come as a surprise.

Screenshot 2024-03-13 102818

LauraW0622 commented 4 months ago

I started the analysis in AntConc with an n-gram of 1. Most of the words were "the", "and", "I" and "to", which was not unexpected. The n-garm size that gave requencies of 5 or more were 1 through 6. The n-gram sizes that gave frequencies in the double digits were 1 through 5. db 3gram I was suprised to see a clue right away about what the text contains with "dr van helsing" and "dr seward s" in the top ten triplets. I expected it to be things like "out of the" or "it was a". db 3gramKWIC The most frequent triplet was "I could see". It seems like "I could see that ..." was a phrase that was very often used in the text. dc drsewarddiary Another frequent phrase was "Dr Seward's Diary -". It was easy to determine that Dr. Seward was likely the 1st person narrator in the form of diary entries. dc cirrus Unfortunately, the cirrus from Voyant was not helpful in providing clues to what the text was. Based on AntConc 1-grams, this was not surprising. dc giveaway voyant What did give the text away was the phrases found in the Voyant tool.... The text is "Dracula" by Bram Stoker.